pytorch link

This commit is contained in:
Varuna Jayasiri
2021-01-30 13:38:15 +05:30
parent 3161c23592
commit 9b09a5f3d2
42 changed files with 530 additions and 508 deletions

View File

@ -9,7 +9,8 @@ summary: >
# Capsule Networks
This is a PyTorch implementation and tutorial of [Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
This is a [PyTorch](https://pytorch.org) implementation/tutorial of
[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
Capsule networks is neural network architecture that embeds features
as capsules and routes them with a voting mechanism to next layer of capsules.

View File

@ -8,7 +8,7 @@ summary: >
# Cycle GAN
This is an implementation of paper
This is a [PyTorch](https://pytorch.org) implementation/tutorial of paper
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593).
I've taken pieces of code from [eriklindernoren/PyTorch-GAN](https://github.com/eriklindernoren/PyTorch-GAN).

View File

@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Deep Convolutional Generati
# Deep Convolutional Generative Adversarial Networks (DCGAN)
This is an implementation of paper
This is a [PyTorch](https://pytorch.org) implementation of paper
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434).
This implementation is based on the [PyTorch DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).

View File

@ -7,7 +7,8 @@ summary: A PyTorch implementation/tutorial of HyperLSTM introduced in paper Hype
# HyperNetworks - HyperLSTM
We have implemented HyperLSTM introduced in paper
[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations.
[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations
using [PyTorch](https://pytorch.org).
[This blog post](https://blog.otoro.net/2016/09/28/hyper-networks/)
by David Ha gives a good explanation of HyperNetworks.
@ -17,7 +18,7 @@ Here's the link to code: [`experiment.py`](experiment.html)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/hypernetworks/experiment.ipynb)
[![View Run](https://img.shields.io/badge/labml-experiment-brightgreen)](https://web.lab-ml.com/run?uuid=9e7f39e047e811ebbaff2b26e3148b3d)
HyperNetworks uses a smaller network to generate weights of a larger network.
HyperNetworks use a smaller network to generate weights of a larger network.
There are two variants: static hyper-networks and dynamic hyper-networks.
Static HyperNetworks have smaller network that generates weights (kernels)
of a convolutional network. Dynamic HyperNetworks generate parameters of a

View File

@ -5,6 +5,8 @@ summary: A simple PyTorch implementation/tutorial of Long Short-Term Memory (LST
---
# Long Short-Term Memory (LSTM)
This is a [PyTorch](https://pytorch.org) implementation of Long Short-Term Memory.
"""
from typing import Optional, Tuple

View File

@ -11,7 +11,7 @@ This is based from AdaBelief
of the paper
[AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients](https://arxiv.org/abs/2010.07468).
This is implemented here as an extension to [RAdam](radam.html).
This is implemented in [PyTorch](https://pytorch.org) as an extension to [RAdam](radam.html).
The main difference between Adam optimizer and AdaBelief is that,
how it calculates the adaptive learning rate;

View File

@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Adam optimizer
# Adam Optimizer
This is an implementation of popular optimizer *Adam* from paper
This is a [PyTorch](https://pytorch.org) implementation of popular optimizer *Adam* from paper
[Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v9).
*Adam* update is,

View File

@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of AMSGrad optimizer.
# AMSGrad
This is an implementation of the paper
This is a [PyTorch](https://pytorch.org) implementation of the paper
[On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237).
We implement this as an extension to our [Adam optimizer implementation](adam.html).

View File

@ -8,7 +8,7 @@ summary: >
# Noam Optimizer
This is the implementation of optimizer introduced in the paper
This is the [PyTorch](https://pytorch.org) implementation of optimizer introduced in the paper
[Attention Is All You Need](https://arxiv.org/abs/1706.03762).
"""
from typing import Dict

View File

@ -11,7 +11,8 @@ This implementation is based on
of the paper
[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265).
We have implemented it as an extension to [our AMSGrad implementation](amsgrad.html)
We have implemented it in [PyTorch](https://pytorch.org)
as an extension to [our AMSGrad implementation](amsgrad.html)
thus requiring only the modifications to be implemented.
Adam optimizer sometimes converges to a bad local optima during the initial stages of the training;

View File

@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Recurrent Highway Networks.
# Recurrent Highway Networks
This is an implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
This is a [PyTorch](https://pytorch.org) implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
"""
from typing import Optional

View File

@ -11,7 +11,7 @@ summary: >
# Deep Q Networks (DQN)
This is an implementation of paper
This is a [PyTorch](https://pytorch.org) implementation of paper
[Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)
along with [Dueling Network](model.html), [Prioritized Replay](replay_buffer.html)
and Double Q Network.

View File

@ -7,7 +7,8 @@ summary: >
# Proximal Policy Optimization (PPO)
This is a an implementation of [Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).
This is a [PyTorch](https://pytorch.org) implementation of
[Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).
You can find an experiment that uses it [here](experiment.html).
The experiment uses [Generalized Advantage Estimation](gae.html).

View File

@ -6,7 +6,8 @@ summary: A PyTorch implementation/tutorial of Generalized Advantage Estimation (
# Generalized Advantage Estimation (GAE)
This is an implementation of paper [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
This is a [PyTorch](https://pytorch.org) implementation of paper
[Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
"""
import numpy as np

View File

@ -8,7 +8,7 @@ summary: >
# Sketch RNN
This is an annotated implementation of the paper
This is an annotated [PyTorch](https://pytorch.org) implementation of the paper
[A Neural Representation of Sketch Drawings](https://arxiv.org/abs/1704.03477).
Sketch RNN is a sequence-to-sequence variational auto-encoder.

View File

@ -6,6 +6,9 @@ summary: Documented reusable implementation of the position wise feedforward net
# Position-wise Feed-Forward Network (FFN)
This is a [PyTorch](https://pytorch.org) implementation
of position-wise feedforward network used in transformer.
FFN consists of two fully connected layers.
Number of dimensions in the hidden layer $d_{ff}$, is generally set to around
four times that of the token embedding $d_{model}$.

View File

@ -7,10 +7,10 @@ summary: >
# Feedback Transformer
This is an implementation of the paper
This is a [PyTorch](https://pytorch.org) implementation of the paper
[Accessing Higher-level Representations in Sequential Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402).
Normal transformers process tokens in parallel and each transformer layer pays attention
Normal transformers process tokens in parallel. Each transformer layer pays attention
to the outputs of the previous layer.
Feedback transformer pays attention to the output of all layers in previous steps.
So this adds recurrence and we need to process token-by-token.

View File

@ -7,8 +7,9 @@ summary: >
# GPT
This is an tutorial of
[OpenAI GPT architecture](https://openai.com/blog/better-language-models/).
This is an tutorial/implementation of
[OpenAI GPT architecture](https://openai.com/blog/better-language-models/)
in [PyTorch](https://pytorch.org).
We got a bunch of implementation details from
[minGPT](https://github.com/karpathy/minGPT)
by [@karpathy](https://twitter.com/karpathy).

View File

@ -11,7 +11,7 @@ summary: >
# k-Nearest Neighbor Language Models
This is an implementation of the paper
This is a [PyTorch](https://pytorch.org) implementation of the paper
[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172).
It uses k-nearest neighbors to improve perplexity of autoregressive transformer models.

View File

@ -9,7 +9,8 @@ summary: >
# Relative Multi-Headed Attention
This is an implementation of
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860).
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
in [PyTorch](https://pytorch.org).
Transformer has a limited attention span,
equal to the length of the sequence trained in parallel.

View File

@ -7,7 +7,7 @@ summary: >
# Switch Transformer
This is a miniature implementation of the paper
This is a miniature [PyTorch](https://pytorch.org) implementation of the paper
[Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961).
Our implementation only has a few million parameters and doesn't do model parallel distributed training.
It does single GPU training but we implement the concept of switching as described in the paper.