mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-14 01:13:00 +08:00
repo name
This commit is contained in:
@ -99,7 +99,7 @@ self-attention, and the pass-through in the residual connection is not normalize
|
||||
This is supposed to be more stable in standard transformer setups.</p>
|
||||
<p>Here are <a href="experiment.html">the training code</a> and a notebook for training a compressive transformer
|
||||
model on the Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/compressive/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/compressive/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/0d9b5338726c11ebb7c80242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -99,7 +99,7 @@ self-attention, and the pass-through in the residual connection is not normalize
|
||||
This is supposed to be more stable in standard transformer setups.</p>
|
||||
<p>Here are <a href="https://nn.labml.ai/transformers/compressive/experiment.html">the training code</a> and a notebook for training a compressive transformer
|
||||
model on the Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/compressive/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/compressive/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/0d9b5338726c11ebb7c80242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -70,7 +70,7 @@
|
||||
<h1>Train Fast Weights Transformer</h1>
|
||||
<p>This trains a fast weights transformer model for auto-regression.</p>
|
||||
<p>Here’s a Colab notebook for training a fast weights transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -139,7 +139,7 @@ a new update rule for $\color{cyan}{W^{(i)}} = f(\color{cyan}{W^{(i-1)}})$ and c
|
||||
$\frac{1}{z^{(i)} \cdot \color{lightgreen}{\phi(q^{(i)})}}$</p>
|
||||
<p>Here are <a href="experiment.html">the training code</a> and a notebook for training a fast weights
|
||||
transformer on the Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -73,7 +73,7 @@
|
||||
<p>Here is the <a href="https://nn.labml.ai/transformers/fast_weights/index.html">annotated implementation</a>.
|
||||
Here are <a href="https://nn.labml.ai/transformers/fast_weights/experiment.html">the training code</a>
|
||||
and a notebook for training a fast weights transformer on the Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -72,7 +72,7 @@
|
||||
You can pick the original feedback transformer or the new version
|
||||
where the keys and values are precalculated.</p>
|
||||
<p>Here’s a Colab notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/d8eb9416530a11eb8fb50242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -91,7 +91,7 @@ them cached.
|
||||
The <a href="#shared_kv">second half</a> of this file implements this.
|
||||
We implemented a custom PyTorch function to improve performance.</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/d8eb9416530a11eb8fb50242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -91,8 +91,8 @@ them cached.
|
||||
The <a href="#shared_kv">second half</a> of this file implements this.
|
||||
We implemented a custom PyTorch function to improve performance.</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a feedback transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/feedback/experiment.ipynb">Colab Notebook</a></p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/feedback/experiment.ipynb">Colab Notebook</a></p>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/feedback/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/d8eb9416530a11eb8fb50242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -72,7 +72,7 @@
|
||||
We try different variants for the <a href="../feed_forward">position-wise feedforward network</a>.</p>
|
||||
<p><em>This is a simpler implementation that doesn’t use <a href="experiment.html"><code>labml.configs</code></a> module.
|
||||
We decided to write a simpler implementation to make it easier for readers who are not familiar.</em></p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/glu_variants/simple.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/glu_variants/simple.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/86b773f65fc911ebb2ac0242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -85,7 +85,7 @@ are the parameter initialization, weight decay, and learning rate schedule.
|
||||
For the transformer we reuse the
|
||||
<a href="../transformers/index.html">existing labml/nn transformer implementation</a>.</p>
|
||||
<p>Here’s a notebook for training a GPT model on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/gpt/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/gpt/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/0324c6d0562111eba65d0242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -89,7 +89,7 @@ In a distributed setup you would have each FFN (each very large) on a different
|
||||
<p>The paper introduces another loss term to balance load among the experts (FFNs) and
|
||||
discusses dropping tokens when routing is not balanced.</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a switch transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/switch/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/switch/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/c4656c605b9311eba13d0242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -89,7 +89,7 @@ In a distributed setup you would have each FFN (each very large) on a different
|
||||
<p>The paper introduces another loss term to balance load among the experts (FFNs) and
|
||||
discusses dropping tokens when routing is not balanced.</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a switch transformer on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/switch/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/switch/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/c4656c605b9311eba13d0242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -84,7 +84,7 @@ They introduce relative positional encoding, where the positional encodings
|
||||
are introduced at the attention calculation.</p>
|
||||
<p>Annotated implementation of relative multi-headed attention is in <a href="relative_mha.html"><code>relative_mha.py</code></a>.</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a transformer XL model on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/xl/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/xl/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/d3b6760c692e11ebb6a70242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
@ -84,7 +84,7 @@ They introduce relative positional encoding, where the positional encodings
|
||||
are introduced at the attention calculation.</p>
|
||||
<p>Annotated implementation of relative multi-headed attention is in <a href="https://nn.labml.ai/transformers/xl/relative_mha.html"><code>relative_mha.py</code></a>.</p>
|
||||
<p>Here’s <a href="https://nn.labml.ai/transformers/xl/experiment.html">the training code</a> and a notebook for training a transformer XL model on Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/xl/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/transformers/xl/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/d3b6760c692e11ebb6a70242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
|
Reference in New Issue
Block a user