mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 08:41:23 +08:00
papers list
This commit is contained in:
@ -69,7 +69,7 @@
|
||||
</div>
|
||||
<h1>Switch Transformer</h1>
|
||||
<p>This is a miniature <a href="https://pytorch.org">PyTorch</a> implementation of the paper
|
||||
<a href="https://arxiv.org/abs/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
|
||||
<a href="https://papers.labml.ai/paper/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
|
||||
Our implementation only has a few million parameters and doesn’t do model parallel distributed training.
|
||||
It does single GPU training, but we implement the concept of switching as described in the paper.</p>
|
||||
<p>The Switch Transformer uses different parameters for each token by switching among parameters
|
||||
|
@ -69,7 +69,7 @@
|
||||
</div>
|
||||
<h1><a href="https://nn.labml.ai/transformers/switch/index.html">Switch Transformer</a></h1>
|
||||
<p>This is a miniature <a href="https://pytorch.org">PyTorch</a> implementation of the paper
|
||||
<a href="https://arxiv.org/abs/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
|
||||
<a href="https://papers.labml.ai/paper/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
|
||||
Our implementation only has a few million parameters and doesn’t do model parallel distributed training.
|
||||
It does single GPU training, but we implement the concept of switching as described in the paper.</p>
|
||||
<p>The Switch Transformer uses different parameters for each token by switching among parameters
|
||||
|
Reference in New Issue
Block a user