papers list

This commit is contained in:
Varuna Jayasiri
2021-08-17 15:27:00 +05:30
parent 996b58be04
commit e28d6ed0a3
73 changed files with 359 additions and 138 deletions

View File

@ -69,7 +69,7 @@
</div>
<h1>Switch Transformer</h1>
<p>This is a miniature <a href="https://pytorch.org">PyTorch</a> implementation of the paper
<a href="https://arxiv.org/abs/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
<a href="https://papers.labml.ai/paper/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
Our implementation only has a few million parameters and doesn&rsquo;t do model parallel distributed training.
It does single GPU training, but we implement the concept of switching as described in the paper.</p>
<p>The Switch Transformer uses different parameters for each token by switching among parameters

View File

@ -69,7 +69,7 @@
</div>
<h1><a href="https://nn.labml.ai/transformers/switch/index.html">Switch Transformer</a></h1>
<p>This is a miniature <a href="https://pytorch.org">PyTorch</a> implementation of the paper
<a href="https://arxiv.org/abs/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
<a href="https://papers.labml.ai/paper/2101.03961">Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity</a>.
Our implementation only has a few million parameters and doesn&rsquo;t do model parallel distributed training.
It does single GPU training, but we implement the concept of switching as described in the paper.</p>
<p>The Switch Transformer uses different parameters for each token by switching among parameters