Files
2024-06-27 19:35:37 +05:30

171 lines
9.8 KiB
HTML
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html lang="zh">
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta name="twitter:title" content="Transformers"/>
<meta name="twitter:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
<meta name="twitter:site" content="@labmlai"/>
<meta name="twitter:creator" content="@labmlai"/>
<meta property="og:url" content="https://nn.labml.ai/transformers/index.html"/>
<meta property="og:title" content="Transformers"/>
<meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta property="og:site_name" content="Transformers"/>
<meta property="og:type" content="object"/>
<meta property="og:title" content="Transformers"/>
<meta property="og:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
<title>Transformers</title>
<link rel="shortcut icon" href="/icon.png"/>
<link rel="stylesheet" href="../pylit.css?v=1">
<link rel="canonical" href="https://nn.labml.ai/transformers/index.html"/>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'G-4V3HC8HBLH');
</script>
</head>
<body>
<div id='container'>
<div id="background"></div>
<div class='section'>
<div class='docs'>
<p>
<a class="parent" href="/">home</a>
<a class="parent" href="index.html">transformers</a>
</p>
<p>
<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations" target="_blank">
<img alt="Github"
src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
style="max-width:100%;"/></a>
<a href="https://twitter.com/labmlai" rel="nofollow" target="_blank">
<img alt="Twitter"
src="https://img.shields.io/twitter/follow/labmlai?style=social"
style="max-width:100%;"/></a>
</p>
<p>
<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/transformers/__init__.py" target="_blank">
View code on Github</a>
</p>
</div>
</div>
<div class='section' id='section-0'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-0'>#</a>
</div>
<h1>Transformers</h1>
</a><p>本节内容包含对论文<a href="https://arxiv.org/abs/1706.03762">《 Attention is All You Need 》</a>中原始 Transformer 的解释与<a href="https://pytorch.org/">PyTorch</a> 实现,以及对其衍生和增强版本的解释与实现。</p>
<ul><li><a href="mha.html">多头注意力</a></li>
<li><a href="models.html">Transformer 编码器和解码器模型</a></li>
<li><a href="feed_forward.html">位置前馈网络 (FFN)</a></li>
<li><a href="positional_encoding.html">固定位置编码</a></li></ul>
<h2><a href="xl/index.html">Transformer XL</a></h2>
<p>这是使用<a href="xl/relative_mha.html">相对多头注意力</a>的 Transformer XL 模型的实现。</p>
<h2><a href="rope/index.html">旋转式位置编码</a></h2>
<p>这是旋转式位置编码( ROPE )的实现。</p>
<h2><a href="alibi/index.html">线性偏差注意力</a></h2>
<p>这是线性偏差注意力( ALIBI )的实现。</p>
<h2><a href="retro/index.html">RETRO</a></h2>
<p>这是对检索增强 Transformer RETRO )的实现。</p>
<h2><a href="compressive/index.html">压缩 Transformer</a></h2>
<p>这是一个压缩transformer的实现它在<a href="xl/index.html">Transformer XL</a> 的基础上,通过压缩最早期的记忆来延长注意力跨度。</p>
<h2><a href="gpt/index.html">GPT 架构</a></h2>
<p>这是 GPT-2 结构的实现。</p>
<h2><a href="glu_variants/simple.html">GLU 变体</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2002.05202">《 GLU Variants Improve Transformer 》</a>的实现。</p>
<h2><a href="knn/index.html">kNN-LM</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/1911.00172">《 Generalization through Memorization: Nearest Neighbor Language Models 》</a>的实现。</p>
<h2><a href="feedback/index.html">自反馈 Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2002.09402">《 Accessing Higher-level Representations in Sequential Transformers with Feedback Memory 》</a>的实现。</p>
<h2><a href="switch/index.html">Switch Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2101.03961">《 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 》</a>的一个简化实现。我们的实现仅包含几百万个参数,并且只在单 GPU 上进行训练,不涉及并行分布式训练,但我们仍然实现了论文中描述的 Switch 概念。</p>
<h2><a href="fast_weights/index.html">快速权重 Transformer</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2102.11174">《 Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch 》</a>的实现。</p>
<h2><a href="fnet/index.html">Fnet使用傅里叶变换混合 token </a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.03824">《 FNet: Mixing Tokens with Fourier Transforms 》</a>的实现。</p>
<h2><a href="aft/index.html">无注意力 Transformer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.14103">《 An Attention Free Transformer 》</a>的实现。</p>
<h2><a href="mlm/index.html">掩码语言模型</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/1810.04805">《 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 》</a>中用于预训练的掩码语言模型的实现</p>
<h2><a href="mlp_mixer/index.html">MLP-Mixer一种用于视觉的全 MLP 架构</a></h2>
<p>这是论文 <a href="https://arxiv.org/abs/2105.01601">《 MLP-Mixer: An all-MLP Architecture for Vision 》</a>的实现。</p>
<h2><a href="gmlp/index.html">门控多层感知器 (gMLP)</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2105.08050">《 Pay Attention to MLPs 》</a>的实现。</p>
<h2><a href="vit/index.html">视觉 Transformer (ViT)</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2010.11929">《 An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale 》</a>的实现。</p>
<h2><a href="primer_ez/index.html">Primer</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2109.08668">《 Primer: Searching for Efficient Transformers for Language Modeling 》</a>的实现。</p>
<h2><a href="hour_glass/index.html">沙漏网络</a></h2>
<p>这是论文<a href="https://arxiv.org/abs/2110.13711">《 Hierarchical Transformers Are More Efficient Language Models 》</a>的实现</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">112</span><span></span><span class="kn">from</span> <span class="nn">.configs</span> <span class="kn">import</span> <span class="n">TransformerConfigs</span>
<span class="lineno">113</span><span class="kn">from</span> <span class="nn">.models</span> <span class="kn">import</span> <span class="n">TransformerLayer</span><span class="p">,</span> <span class="n">Encoder</span><span class="p">,</span> <span class="n">Decoder</span><span class="p">,</span> <span class="n">Generator</span><span class="p">,</span> <span class="n">EncoderDecoder</span>
<span class="lineno">114</span><span class="kn">from</span> <span class="nn">.mha</span> <span class="kn">import</span> <span class="n">MultiHeadAttention</span>
<span class="lineno">115</span><span class="kn">from</span> <span class="nn">labml_nn.transformers.xl.relative_mha</span> <span class="kn">import</span> <span class="n">RelativeMultiHeadAttention</span></pre></div>
</div>
</div>
<div class='footer'>
<a href="https://labml.ai">labml.ai</a>
</div>
</div>
<script src=../interactive.js?v=1"></script>
<script>
function handleImages() {
var images = document.querySelectorAll('p>img')
for (var i = 0; i < images.length; ++i) {
handleImage(images[i])
}
}
function handleImage(img) {
img.parentElement.style.textAlign = 'center'
var modal = document.createElement('div')
modal.id = 'modal'
var modalContent = document.createElement('div')
modal.appendChild(modalContent)
var modalImage = document.createElement('img')
modalContent.appendChild(modalImage)
var span = document.createElement('span')
span.classList.add('close')
span.textContent = 'x'
modal.appendChild(span)
img.onclick = function () {
console.log('clicked')
document.body.appendChild(modal)
modalImage.src = img.src
}
span.onclick = function () {
document.body.removeChild(modal)
}
}
handleImages()
</script>
</body>
</html>