diff --git a/docs/capsule_networks/index.html b/docs/capsule_networks/index.html index e563ece7..334c1d50 100644 --- a/docs/capsule_networks/index.html +++ b/docs/capsule_networks/index.html @@ -68,7 +68,7 @@

Capsule Networks

This is a PyTorch implementation/tutorial of -Dynamic Routing Between Capsules.

+Dynamic Routing Between Capsules.

Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.

Unlike in other implementations of models, we’ve included a sample, because diff --git a/docs/capsule_networks/mnist.html b/docs/capsule_networks/mnist.html index e2feb1fb..8b64caf0 100644 --- a/docs/capsule_networks/mnist.html +++ b/docs/capsule_networks/mnist.html @@ -69,7 +69,7 @@

Classify MNIST digits with Capsule Networks

This is an annotated PyTorch code to classify MNIST digits with PyTorch.

This paper implements the experiment described in paper -Dynamic Routing Between Capsules.

+Dynamic Routing Between Capsules.

14from typing import Any
diff --git a/docs/capsule_networks/readme.html b/docs/capsule_networks/readme.html
index 4d6e991f..215aa730 100644
--- a/docs/capsule_networks/readme.html
+++ b/docs/capsule_networks/readme.html
@@ -68,7 +68,7 @@
                 

Capsule Networks

This is a PyTorch implementation/tutorial of -Dynamic Routing Between Capsules.

+Dynamic Routing Between Capsules.

Capsule network is a neural network architecture that embeds features as capsules and routes them with a voting mechanism to next layer of capsules.

Unlike in other implementations of models, we’ve included a sample, because diff --git a/docs/gan/cycle_gan/index.html b/docs/gan/cycle_gan/index.html index 9ae58422..5f706732 100644 --- a/docs/gan/cycle_gan/index.html +++ b/docs/gan/cycle_gan/index.html @@ -69,7 +69,7 @@

Cycle GAN

This is a PyTorch implementation/tutorial of the paper -Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

+Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

I’ve taken pieces of code from eriklindernoren/PyTorch-GAN. It is a very good resource if you want to checkout other GAN variations too.

Cycle GAN does image-to-image translation. diff --git a/docs/gan/cycle_gan/readme.html b/docs/gan/cycle_gan/readme.html index 79637c15..56ebf408 100644 --- a/docs/gan/cycle_gan/readme.html +++ b/docs/gan/cycle_gan/readme.html @@ -69,7 +69,7 @@

Cycle GAN

This is a PyTorch implementation/tutorial of the paper -Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

+Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks.

diff --git a/docs/gan/dcgan/index.html b/docs/gan/dcgan/index.html index 820cb788..7ef2b47d 100644 --- a/docs/gan/dcgan/index.html +++ b/docs/gan/dcgan/index.html @@ -69,7 +69,7 @@

Deep Convolutional Generative Adversarial Networks (DCGAN)

This is a PyTorch implementation of paper -Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.

+Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.

This implementation is based on the PyTorch DCGAN Tutorial.

diff --git a/docs/gan/dcgan/readme.html b/docs/gan/dcgan/readme.html index 7e093b6f..f8472341 100644 --- a/docs/gan/dcgan/readme.html +++ b/docs/gan/dcgan/readme.html @@ -69,7 +69,7 @@

Deep Convolutional Generative Adversarial Networks - DCGAN

This is a PyTorch implementation of paper -Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.

+Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks.

diff --git a/docs/gan/original/index.html b/docs/gan/original/index.html index 556327ee..39411415 100644 --- a/docs/gan/original/index.html +++ b/docs/gan/original/index.html @@ -69,7 +69,7 @@

Generative Adversarial Networks (GAN)

This is an implementation of -Generative Adversarial Networks.

+Generative Adversarial Networks.

The generator, $G(\pmb{z}; \theta_g)$ generates samples that match the distribution of data, while the discriminator, $D(\pmb{x}; \theta_g)$ gives the probability that $\pmb{x}$ came from data rather than $G$.

diff --git a/docs/gan/original/readme.html b/docs/gan/original/readme.html index 0bc677c1..152d29e8 100644 --- a/docs/gan/original/readme.html +++ b/docs/gan/original/readme.html @@ -69,7 +69,7 @@

Generative Adversarial Networks - GAN

This is an annotated implementation of -Generative Adversarial Networks.

+Generative Adversarial Networks.

diff --git a/docs/gan/stylegan/index.html b/docs/gan/stylegan/index.html index 68d6be4d..95b7e3d1 100644 --- a/docs/gan/stylegan/index.html +++ b/docs/gan/stylegan/index.html @@ -69,12 +69,12 @@

StyleGAN 2

This is a PyTorch implementation of the paper - Analyzing and Improving the Image Quality of StyleGAN + Analyzing and Improving the Image Quality of StyleGAN which introduces StyleGAN 2. StyleGAN 2 is an improvement over StyleGAN from the paper - A Style-Based Generator Architecture for Generative Adversarial Networks. + A Style-Based Generator Architecture for Generative Adversarial Networks. And StyleGAN is based on Progressive GAN from the paper - Progressive Growing of GANs for Improved Quality, Stability, and Variation. + Progressive Growing of GANs for Improved Quality, Stability, and Variation. All three papers are from the same authors from NVIDIA AI.

Our implementation is a minimalistic StyleGAN 2 model training code. Only single GPU training is supported to keep the implementation simple. @@ -1695,7 +1695,7 @@ since we want to calculate the standard deviation for each feature.

The down-sample operation smoothens each feature channel and scale $2 \times$ using bilinear interpolation. This is based on the paper - Making Convolutional Networks Shift-Invariant Again.

+ Making Convolutional Networks Shift-Invariant Again.

645class DownSample(nn.Module):
@@ -1766,7 +1766,7 @@ This is based on the paper

Up-sample

The up-sample operation scales the image up by $2 \times$ and smoothens each feature channel. This is based on the paper - Making Convolutional Networks Shift-Invariant Again.

+ Making Convolutional Networks Shift-Invariant Again.

668class UpSample(nn.Module):
@@ -2265,7 +2265,7 @@ Without equalized learning rate, the effective weights will get updated proporti

Gradient Penalty

This is the $R_1$ regularization penality from the paper -Which Training Methods for GANs do actually Converge?.

+Which Training Methods for GANs do actually Converge?.

diff --git a/docs/gan/stylegan/readme.html b/docs/gan/stylegan/readme.html index 6c946567..999171ae 100644 --- a/docs/gan/stylegan/readme.html +++ b/docs/gan/stylegan/readme.html @@ -69,12 +69,12 @@

StyleGAN 2

This is a PyTorch implementation of the paper - Analyzing and Improving the Image Quality of StyleGAN + Analyzing and Improving the Image Quality of StyleGAN which introduces StyleGAN2. StyleGAN 2 is an improvement over StyleGAN from the paper - A Style-Based Generator Architecture for Generative Adversarial Networks. + A Style-Based Generator Architecture for Generative Adversarial Networks. And StyleGAN is based on Progressive GAN from the paper - Progressive Growing of GANs for Improved Quality, Stability, and Variation. + Progressive Growing of GANs for Improved Quality, Stability, and Variation. All three papers are from the same authors from NVIDIA AI.

diff --git a/docs/gan/wasserstein/gradient_penalty/index.html b/docs/gan/wasserstein/gradient_penalty/index.html index 84057d0d..7796bd39 100644 --- a/docs/gan/wasserstein/gradient_penalty/index.html +++ b/docs/gan/wasserstein/gradient_penalty/index.html @@ -73,7 +73,7 @@

Gradient Penalty for Wasserstein GAN (WGAN-GP)

This is an implementation of -Improved Training of Wasserstein GANs.

+Improved Training of Wasserstein GANs.

WGAN suggests clipping weights to enforce Lipschitz constraint on the discriminator network (critic). This and other weight constraints like L2 norm clipping, weight normalization, @@ -82,7 +82,7 @@ L1, L2 weight decay have problems:

  • Limiting the capacity of the discriminator
  • Exploding and vanishing gradients (without Batch Normalization).
  • -

    The paper Improved Training of Wasserstein GANs +

    The paper Improved Training of Wasserstein GANs proposal a better way to improve Lipschitz constraint, a gradient penalty.

    -We use double Q-learning, where +We use double Q-learning, where the $\operatorname{argmax}$ is taken from $\color{cyan}{\theta_i}$ and the value is taken from $\color{orange}{\theta_i^{-}}$.

    And the loss function becomes, diff --git a/docs/rl/dqn/model.html b/docs/rl/dqn/model.html index 71e92fba..306d814a 100644 --- a/docs/rl/dqn/model.html +++ b/docs/rl/dqn/model.html @@ -82,7 +82,7 @@ #

    Dueling Network ⚔️ Model for $Q$ Values

    -

    We are using a dueling network +

    We are using a dueling network to calculate Q-values. Intuition behind dueling network architecture is that in most states the action doesn’t matter, diff --git a/docs/rl/dqn/replay_buffer.html b/docs/rl/dqn/replay_buffer.html index 03693cf8..18c310db 100644 --- a/docs/rl/dqn/replay_buffer.html +++ b/docs/rl/dqn/replay_buffer.html @@ -68,7 +68,7 @@ #

    Prioritized Experience Replay Buffer

    -

    This implements paper Prioritized experience replay, +

    This implements paper Prioritized experience replay, using a binary segment tree.

    @@ -83,7 +83,7 @@ using a binary segment tree.

    #

    Buffer for Prioritized Experience Replay

    -

    Prioritized experience replay +

    Prioritized experience replay samples important transitions more frequently. The transitions are prioritized by the Temporal Difference error (td error), $\delta$.

    We sample transition $i$ with probability, diff --git a/docs/rl/ppo/gae.html b/docs/rl/ppo/gae.html index db2f96a6..f8f78c27 100644 --- a/docs/rl/ppo/gae.html +++ b/docs/rl/ppo/gae.html @@ -69,7 +69,7 @@

    Generalized Advantage Estimation (GAE)

    This is a PyTorch implementation of paper -Generalized Advantage Estimation.

    +Generalized Advantage Estimation.

    You can find an experiment that uses it here.

    diff --git a/docs/rl/ppo/index.html b/docs/rl/ppo/index.html index 36e8c4b2..548a35ed 100644 --- a/docs/rl/ppo/index.html +++ b/docs/rl/ppo/index.html @@ -69,7 +69,7 @@

    Proximal Policy Optimization - PPO

    This is a PyTorch implementation of -Proximal Policy Optimization - PPO.

    +Proximal Policy Optimization - PPO.

    PPO is a policy gradient method for reinforcement learning. Simple policy gradient methods do a single gradient update per sample (or a set of samples). Doing multiple gradient steps for a single sample causes problems @@ -171,7 +171,7 @@ J(\pi_\theta) - J(\pi_{\theta_{OLD}}) The error we introduce to $J(\pi_\theta) - J(\pi_{\theta_{OLD}})$ by this assumption is bound by the KL divergence between $\pi_\theta$ and $\pi_{\theta_{OLD}}$. -Constrained Policy Optimization +Constrained Policy Optimization shows the proof of this. I haven’t read it.

    where $\Phi(x) = P(X \le x), X \sim \mathcal{N}(0,1)$

    -

    It was introduced in paper Gaussian Error Linear Units.

    +

    It was introduced in paper Gaussian Error Linear Units.

    62@option(FeedForwardConfigs.activation, 'GELU')
    @@ -294,7 +294,7 @@
                     

    GLU Variants

    These are variants with gated hidden layers for the FFN -as introduced in paper GLU Variants Improve Transformer. +as introduced in paper GLU Variants Improve Transformer. We have omitted the bias terms as specified in the paper.

    diff --git a/docs/transformers/fast_weights/index.html b/docs/transformers/fast_weights/index.html index f78cf2c1..387b07ba 100644 --- a/docs/transformers/fast_weights/index.html +++ b/docs/transformers/fast_weights/index.html @@ -69,7 +69,7 @@

    Fast weights transformer

    The paper -Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch +Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch finds similarities between linear self-attention and fast weight systems and makes modifications to self-attention update rule based on that. It also introduces a simpler, yet effective kernel function.

    diff --git a/docs/transformers/fast_weights/readme.html b/docs/transformers/fast_weights/readme.html index 48763408..271418c9 100644 --- a/docs/transformers/fast_weights/readme.html +++ b/docs/transformers/fast_weights/readme.html @@ -69,7 +69,7 @@

    Fast weights transformer

    This is an annotated implementation of the paper -Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch.

    +Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch.

    Here is the annotated implementation. Here are the training code and a notebook for training a fast weights transformer on the Tiny Shakespeare dataset.

    diff --git a/docs/transformers/feed_forward.html b/docs/transformers/feed_forward.html index 392c315d..e3ae3e1e 100644 --- a/docs/transformers/feed_forward.html +++ b/docs/transformers/feed_forward.html @@ -84,7 +84,7 @@ GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. where $\Phi(x) = P(X \le x), X \sim \mathcal{N}(0,1)$

    Gated Linear Units

    This is a generic implementation that supports different variants including -Gated Linear Units (GLU). +Gated Linear Units (GLU). We have also implemented experiments on these: