pytorch link

2025-11-01 20:28:41 +08:00 · 2021-01-30 13:38:15 +05:30
parent 3161c23592
commit 9b09a5f3d2
42 changed files with 530 additions and 508 deletions
--- a/labml_nn/capsule_networks/init.py
+++ b/labml_nn/capsule_networks/init.py
@ -9,7 +9,8 @@ summary: >

 # Capsule Networks

-This is a PyTorch implementation and tutorial of [Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
+This is a [PyTorch](https://pytorch.org) implementation/tutorial of
+[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).

 Capsule networks is neural network architecture that embeds features
 as capsules and routes them with a voting mechanism to next layer of capsules.
--- a/labml_nn/gan/cycle_gan.py
+++ b/labml_nn/gan/cycle_gan.py
@ -8,7 +8,7 @@ summary: >

 # Cycle GAN

-This is an implementation of paper
+This is a [PyTorch](https://pytorch.org) implementation/tutorial of paper
 [Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593).

 I've taken pieces of code from [eriklindernoren/PyTorch-GAN](https://github.com/eriklindernoren/PyTorch-GAN).
--- a/labml_nn/gan/dcgan.py
+++ b/labml_nn/gan/dcgan.py
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Deep Convolutional Generati

 # Deep Convolutional Generative Adversarial Networks (DCGAN)

-This is an implementation of paper
+This is a [PyTorch](https://pytorch.org) implementation of paper
 [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434).

 This implementation is based on the [PyTorch DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).
--- a/labml_nn/hypernetworks/hyper_lstm.py
+++ b/labml_nn/hypernetworks/hyper_lstm.py
@ -7,7 +7,8 @@ summary: A PyTorch implementation/tutorial of HyperLSTM introduced in paper Hype
 # HyperNetworks - HyperLSTM

 We have implemented HyperLSTM introduced in paper
-[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations.
+[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations
+using [PyTorch](https://pytorch.org).
 [This blog post](https://blog.otoro.net/2016/09/28/hyper-networks/)
 by David Ha gives a good explanation of HyperNetworks.

@ -17,7 +18,7 @@ Here's the link to code: [`experiment.py`](experiment.html)
 [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/hypernetworks/experiment.ipynb)
 [![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://web.lab-ml.com/run?uuid=9e7f39e047e811ebbaff2b26e3148b3d)

-HyperNetworks uses a smaller network to generate weights of a larger network.
+HyperNetworks use a smaller network to generate weights of a larger network.
 There are two variants: static hyper-networks and dynamic hyper-networks.
 Static HyperNetworks have smaller network that generates weights (kernels)
 of a convolutional network. Dynamic HyperNetworks generate parameters of a
--- a/labml_nn/lstm/init.py
+++ b/labml_nn/lstm/init.py
@ -5,6 +5,8 @@ summary: A simple PyTorch implementation/tutorial of Long Short-Term Memory (LST
 ---

 # Long Short-Term Memory (LSTM)
+
+This is a [PyTorch](https://pytorch.org) implementation of Long Short-Term Memory.
 """

 from typing import Optional, Tuple
--- a/labml_nn/optimizers/ada_belief.py
+++ b/labml_nn/optimizers/ada_belief.py
@ -11,7 +11,7 @@ This is based from AdaBelief
 of the paper
 [AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients](https://arxiv.org/abs/2010.07468).

-This is implemented here as an extension to [RAdam](radam.html).
+This is implemented in [PyTorch](https://pytorch.org) as an extension to [RAdam](radam.html).

 The main difference between Adam optimizer and AdaBelief is that,
 how it calculates the adaptive learning rate;
--- a/labml_nn/optimizers/adam.py
+++ b/labml_nn/optimizers/adam.py
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Adam optimizer

 # Adam Optimizer

-This is an implementation of popular optimizer *Adam* from paper
+This is a [PyTorch](https://pytorch.org) implementation of popular optimizer *Adam* from paper
 [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v9).

 *Adam* update is,
--- a/labml_nn/optimizers/amsgrad.py
+++ b/labml_nn/optimizers/amsgrad.py
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of AMSGrad optimizer.

 # AMSGrad

-This is an implementation of the paper
+This is a [PyTorch](https://pytorch.org) implementation of the paper
 [On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237).

 We implement this as an extension to our [Adam optimizer implementation](adam.html).
--- a/labml_nn/optimizers/noam.py
+++ b/labml_nn/optimizers/noam.py
@ -8,7 +8,7 @@ summary: >

 # Noam Optimizer

-This is the implementation of optimizer introduced in the paper
+This is the [PyTorch](https://pytorch.org) implementation of optimizer introduced in the paper
 [Attention Is All You Need](https://arxiv.org/abs/1706.03762).
 """
 from typing import Dict
--- a/labml_nn/optimizers/radam.py
+++ b/labml_nn/optimizers/radam.py
@ -11,7 +11,8 @@ This implementation is based on
 of the paper
 [On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265).

-We have implemented it as an extension to [our AMSGrad implementation](amsgrad.html)
+We have implemented it in [PyTorch](https://pytorch.org)
+as an extension to [our AMSGrad implementation](amsgrad.html)
 thus requiring only the modifications to be implemented.

 Adam optimizer sometimes converges to a bad local optima during the initial stages of the training;
--- a/labml_nn/recurrent_highway_networks/init.py
+++ b/labml_nn/recurrent_highway_networks/init.py
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Recurrent Highway Networks.

 # Recurrent Highway Networks

-This is an implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
+This is a [PyTorch](https://pytorch.org) implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
 """
 from typing import Optional

--- a/labml_nn/rl/dqn/init.py
+++ b/labml_nn/rl/dqn/init.py
@ -11,7 +11,7 @@ summary: >

 # Deep Q Networks (DQN)

-This is an implementation of paper
+This is a [PyTorch](https://pytorch.org) implementation of paper
 [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)
 along with [Dueling Network](model.html), [Prioritized Replay](replay_buffer.html)
 and Double Q Network.
--- a/labml_nn/rl/ppo/init.py
+++ b/labml_nn/rl/ppo/init.py
@ -7,7 +7,8 @@ summary: >

 # Proximal Policy Optimization (PPO)

-This is a an implementation of [Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).
+This is a [PyTorch](https://pytorch.org) implementation of
+[Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).

 You can find an experiment that uses it [here](experiment.html).
 The experiment uses [Generalized Advantage Estimation](gae.html).
--- a/labml_nn/rl/ppo/gae.py
+++ b/labml_nn/rl/ppo/gae.py
@ -6,7 +6,8 @@ summary: A PyTorch implementation/tutorial of Generalized Advantage Estimation (

 # Generalized Advantage Estimation (GAE)

-This is an implementation of paper [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
+This is a [PyTorch](https://pytorch.org) implementation of paper
+[Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
 """

 import numpy as np
--- a/labml_nn/sketch_rnn/init.py
+++ b/labml_nn/sketch_rnn/init.py
@ -8,7 +8,7 @@ summary: >

 # Sketch RNN

-This is an annotated implementation of the paper
+This is an annotated [PyTorch](https://pytorch.org) implementation of the paper
 [A Neural Representation of Sketch Drawings](https://arxiv.org/abs/1704.03477).

 Sketch RNN is a sequence-to-sequence variational auto-encoder.
--- a/labml_nn/transformers/feed_forward.py
+++ b/labml_nn/transformers/feed_forward.py
@ -6,6 +6,9 @@ summary: Documented reusable implementation of the position wise feedforward net

 # Position-wise Feed-Forward Network (FFN)

+This is a [PyTorch](https://pytorch.org)  implementation
+of position-wise feedforward network used in transformer.
+
 FFN consists of two fully connected layers.
 Number of dimensions in the hidden layer $d_{ff}$, is generally set to around
 four times that of the token embedding $d_{model}$.
--- a/labml_nn/transformers/feedback/init.py
+++ b/labml_nn/transformers/feedback/init.py
@ -7,10 +7,10 @@ summary: >

 # Feedback Transformer

-This is an implementation of the paper
+This is a [PyTorch](https://pytorch.org) implementation of the paper
 [Accessing Higher-level Representations in Sequential Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402).

-Normal transformers process tokens in parallel and each transformer layer pays attention
+Normal transformers process tokens in parallel. Each transformer layer pays attention
 to the outputs of the previous layer.
 Feedback transformer pays attention to the output of all layers in previous steps.
 So this adds recurrence and we need to process token-by-token.
--- a/labml_nn/transformers/gpt/init.py
+++ b/labml_nn/transformers/gpt/init.py
@ -7,8 +7,9 @@ summary: >

 # GPT

-This is an tutorial of
-[OpenAI GPT architecture](https://openai.com/blog/better-language-models/).
+This is an tutorial/implementation of
+[OpenAI GPT architecture](https://openai.com/blog/better-language-models/)
+in [PyTorch](https://pytorch.org).
 We got a bunch of implementation details from
 [minGPT](https://github.com/karpathy/minGPT)
 by [@karpathy](https://twitter.com/karpathy).
--- a/labml_nn/transformers/knn/init.py
+++ b/labml_nn/transformers/knn/init.py
@ -11,7 +11,7 @@ summary: >

 # k-Nearest Neighbor Language Models

-This is an implementation of the paper
+This is a [PyTorch](https://pytorch.org) implementation of the paper
 [Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172).
 It uses k-nearest neighbors to  improve perplexity of autoregressive transformer models.

--- a/labml_nn/transformers/relative_mha.py
+++ b/labml_nn/transformers/relative_mha.py
@ -9,7 +9,8 @@ summary: >
 # Relative Multi-Headed Attention

 This is an implementation of 
-[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860).
+[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
+in [PyTorch](https://pytorch.org).

 Transformer has a limited attention span,
 equal to the length of the sequence trained in parallel.
--- a/labml_nn/transformers/switch/init.py
+++ b/labml_nn/transformers/switch/init.py
@ -7,7 +7,7 @@ summary: >

 # Switch Transformer

-This is a miniature implementation of the paper
+This is a miniature [PyTorch](https://pytorch.org) implementation of the paper
 [Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961).
 Our implementation only has a few million parameters and doesn't do model parallel distributed training.
 It does single GPU training but we implement the concept of switching as described in the paper.