Files
Varuna Jayasiri 1c14551a19 zh
2023-02-28 08:40:22 +05:30

172 lines
9.6 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="zh">
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta name="twitter:title" content="变压器"/>
<meta name="twitter:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
<meta name="twitter:site" content="@labmlai"/>
<meta name="twitter:creator" content="@labmlai"/>
<meta property="og:url" content="https://nn.labml.ai/transformers/index.html"/>
<meta property="og:title" content="变压器"/>
<meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta property="og:site_name" content="变压器"/>
<meta property="og:type" content="object"/>
<meta property="og:title" content="变压器"/>
<meta property="og:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
<title>变压器</title>
<link rel="shortcut icon" href="/icon.png"/>
<link rel="stylesheet" href="../pylit.css?v=1">
<link rel="canonical" href="https://nn.labml.ai/transformers/index.html"/>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'G-4V3HC8HBLH');
</script>
</head>
<body>
<div id='container'>
<div id="background"></div>
<div class='section'>
<div class='docs'>
<p>
<a class="parent" href="/">home</a>
<a class="parent" href="index.html">transformers</a>
</p>
<p>
<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations" target="_blank">
<img alt="Github"
src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
style="max-width:100%;"/></a>
<a href="https://twitter.com/labmlai" rel="nofollow" target="_blank">
<img alt="Twitter"
src="https://img.shields.io/twitter/follow/labmlai?style=social"
style="max-width:100%;"/></a>
</p>
<p>
<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/__init__.py" target="_blank">
View code on Github</a>
</p>
</div>
</div>
<div class='section' id='section-0'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-0'>#</a>
</div>
<h1>变压器</h1>
</a><p>本模块包含 <a href="https://pytorch.org/">PyTorch 实现和论文 Attronger Is <a href="https://papers.labml.ai/paper/1706.03762">All You Need</a> 中对原创变压器的解释,以及它的衍生品和增强功能。</p>
<ul><li><a href="mha.html">多头关注</a></li>
<li><a href="models.html">变压器编码器和解码器型号</a></li>
<li><a href="feed_forward.html">位置前馈网络 (FFN)</a></li>
<li><a href="positional_encoding.html">固定位置编码</a></li></ul>
<h2><a href="xl/index.html">变压器 XL</a></h2>
<p>这使用<a href="xl/relative_mha.html">相对的多头注意力</a>实现了变形金刚 XL 模型</p>
<h2><a href="rope/index.html">旋转位置嵌入</a></h2>
<p>这实现了旋转位置嵌入 (roPE)</p>
<h2><a href="alibi/index.html">注意线性偏差</a></h2>
<p>这实现了线性偏差注意力AliBI</p>
<h2><a href="retro/index.html">复古</a></h2>
<p>这实现了检索增强型转换器RETRO</p>
<h2><a href="compressive/index.html">压缩变压器</a></h2>
<p>这是一种压缩变压器的实现,它通过压缩最古老的存储<a href="xl/index.html">器来延长注意力跨度从而在Transformer XL</a> 上扩展。</p>
<h2><a href="gpt/index.html">GPT 架构</a></h2>
<p>这是 GPT-2 体系结构的实现。</p>
<h2><a href="glu_variants/simple.html">GLU 变体</a></h2>
<p>这是论文 <a href="https://papers.labml.ai/paper/2002.05202">GLU 变体改进变压器的</a>实现。</p>
<h2><a href="knn/index.html">knn-lm</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/1911.00172">通过记忆推广:最近邻语言模型</a>》的实现。</p>
<h2><a href="feedback/index.html">反馈变压器</a></h2>
<p>这是一篇论文《使用<a href="https://papers.labml.ai/paper/2002.09402">反馈存储器访问顺序变压器中的更高层次表示》的</a>实现。</p>
<h2><a href="switch/index.html">开关变压器</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/2101.03961">开关变压器:以简单高效的稀疏度缩放到万亿参数模型</a>》的微型实现。我们的实现只有几百万个参数,不对并行分布式训练进行建模。它进行单个 GPU 训练,但我们实现了白皮书中描述的切换概念。</p>
<h2><a href="fast_weights/index.html">快速重量变压器</a></h2>
<p>这是 <a href="https://papers.labml.ai/paper/2102.11174">PyTorch 中线性变压器是秘密的快速重量存储系统论文的</a>实现。</p>
<h2><a href="fnet/index.html">FNet将令牌与傅里叶变换混合</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/2105.03824">FNet将令牌与傅里叶变换混合</a>》的实现。</p>
<h2><a href="aft/index.html">免注意变压器</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/2105.14103">无注意力变压器》的</a>实现。</p>
<h2><a href="mlm/index.html">屏蔽语言模型</a></h2>
<p>这是在论文《B <a href="https://papers.labml.ai/paper/1810.04805">ERT用于语言理解的深度双向变换器的预训练》中用于预训练的蒙面语言模型的</a>实现。</p>
<h2><a href="mlp_mixer/index.html">MLP 混音器:面向视觉的全 MLP 架构</a></h2>
<p>这是论文 <a href="https://papers.labml.ai/paper/2105.01601">MLP-Mixer视觉的全 MLP 架构的</a>实现。</p>
<h2><a href="gmlp/index.html">注意 MLP (gMLP)</a></h2>
<p>这是 “<a href="https://papers.labml.ai/paper/2105.08050">注意 MLP” 一文的</a>实现。</p>
<h2><a href="vit/index.html">视觉变压器 (ViT)</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/2010.11929">图像值得 16x16 Words大规模图像识别的变形金刚》的</a>实现。</p>
<h2><a href="primer_ez/index.html">Primer</a></h2>
<p>这是论文《入<a href="https://papers.labml.ai/paper/2109.08668">门:为语言建模寻找高效的变换器》的</a>实现。</p>
<h2><a href="hour_glass/index.html">沙漏</a></h2>
<p>这是论文《<a href="https://papers.labml.ai/paper/2110.13711">分层变换器是更有效的语言模型</a>》的实现</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">112</span><span></span><span class="kn">from</span> <span class="nn">.configs</span> <span class="kn">import</span> <span class="n">TransformerConfigs</span>
<span class="lineno">113</span><span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">TransformerLayer</span><span class="p">,</span> <span class="n">Encoder</span><span class="p">,</span> <span class="n">Decoder</span><span class="p">,</span> <span class="n">Generator</span><span class="p">,</span> <span class="n">EncoderDecoder</span>
<span class="lineno">114</span><span class="kn">from</span> <span class="nn">.mha</span> <span class="kn">import</span> <span class="n">MultiHeadAttention</span>
<span class="lineno">115</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.xl.relative_mha</span> <span class="kn">import</span> <span class="n">RelativeMultiHeadAttention</span></pre></div>
</div>
</div>
<div class='footer'>
<a href="https://papers.labml.ai">Trending Research Papers</a>
<a href="https://labml.ai">labml.ai</a>
</div>
</div>
<script src=../interactive.js?v=1"></script>
<script>
function handleImages() {
var images = document.querySelectorAll('p>img')
for (var i = 0; i < images.length; ++i) {
handleImage(images[i])
}
}
function handleImage(img) {
img.parentElement.style.textAlign = 'center'
var modal = document.createElement('div')
modal.id = 'modal'
var modalContent = document.createElement('div')
modal.appendChild(modalContent)
var modalImage = document.createElement('img')
modalContent.appendChild(modalImage)
var span = document.createElement('span')
span.classList.add('close')
span.textContent = 'x'
modal.appendChild(span)
img.onclick = function () {
console.log('clicked')
document.body.appendChild(modal)
modalImage.src = img.src
}
span.onclick = function () {
document.body.removeChild(modal)
}
}
handleImages()
</script>
</body>
</html>