mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-11-01 20:28:41 +08:00
pytorch link
This commit is contained in:
@ -9,7 +9,8 @@ summary: >
|
||||
|
||||
# Capsule Networks
|
||||
|
||||
This is a PyTorch implementation and tutorial of [Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
|
||||
This is a [PyTorch](https://pytorch.org) implementation/tutorial of
|
||||
[Dynamic Routing Between Capsules](https://arxiv.org/abs/1710.09829).
|
||||
|
||||
Capsule networks is neural network architecture that embeds features
|
||||
as capsules and routes them with a voting mechanism to next layer of capsules.
|
||||
|
||||
@ -8,7 +8,7 @@ summary: >
|
||||
|
||||
# Cycle GAN
|
||||
|
||||
This is an implementation of paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation/tutorial of paper
|
||||
[Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks](https://arxiv.org/abs/1703.10593).
|
||||
|
||||
I've taken pieces of code from [eriklindernoren/PyTorch-GAN](https://github.com/eriklindernoren/PyTorch-GAN).
|
||||
|
||||
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Deep Convolutional Generati
|
||||
|
||||
# Deep Convolutional Generative Adversarial Networks (DCGAN)
|
||||
|
||||
This is an implementation of paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of paper
|
||||
[Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434).
|
||||
|
||||
This implementation is based on the [PyTorch DCGAN Tutorial](https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html).
|
||||
|
||||
@ -7,7 +7,8 @@ summary: A PyTorch implementation/tutorial of HyperLSTM introduced in paper Hype
|
||||
# HyperNetworks - HyperLSTM
|
||||
|
||||
We have implemented HyperLSTM introduced in paper
|
||||
[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations.
|
||||
[HyperNetworks](https://arxiv.org/abs/1609.09106), with annotations
|
||||
using [PyTorch](https://pytorch.org).
|
||||
[This blog post](https://blog.otoro.net/2016/09/28/hyper-networks/)
|
||||
by David Ha gives a good explanation of HyperNetworks.
|
||||
|
||||
@ -17,7 +18,7 @@ Here's the link to code: [`experiment.py`](experiment.html)
|
||||
[](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/hypernetworks/experiment.ipynb)
|
||||
[](https://web.lab-ml.com/run?uuid=9e7f39e047e811ebbaff2b26e3148b3d)
|
||||
|
||||
HyperNetworks uses a smaller network to generate weights of a larger network.
|
||||
HyperNetworks use a smaller network to generate weights of a larger network.
|
||||
There are two variants: static hyper-networks and dynamic hyper-networks.
|
||||
Static HyperNetworks have smaller network that generates weights (kernels)
|
||||
of a convolutional network. Dynamic HyperNetworks generate parameters of a
|
||||
|
||||
@ -5,6 +5,8 @@ summary: A simple PyTorch implementation/tutorial of Long Short-Term Memory (LST
|
||||
---
|
||||
|
||||
# Long Short-Term Memory (LSTM)
|
||||
|
||||
This is a [PyTorch](https://pytorch.org) implementation of Long Short-Term Memory.
|
||||
"""
|
||||
|
||||
from typing import Optional, Tuple
|
||||
|
||||
@ -11,7 +11,7 @@ This is based from AdaBelief
|
||||
of the paper
|
||||
[AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients](https://arxiv.org/abs/2010.07468).
|
||||
|
||||
This is implemented here as an extension to [RAdam](radam.html).
|
||||
This is implemented in [PyTorch](https://pytorch.org) as an extension to [RAdam](radam.html).
|
||||
|
||||
The main difference between Adam optimizer and AdaBelief is that,
|
||||
how it calculates the adaptive learning rate;
|
||||
|
||||
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Adam optimizer
|
||||
|
||||
# Adam Optimizer
|
||||
|
||||
This is an implementation of popular optimizer *Adam* from paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of popular optimizer *Adam* from paper
|
||||
[Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980v9).
|
||||
|
||||
*Adam* update is,
|
||||
|
||||
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of AMSGrad optimizer.
|
||||
|
||||
# AMSGrad
|
||||
|
||||
This is an implementation of the paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of the paper
|
||||
[On the Convergence of Adam and Beyond](https://arxiv.org/abs/1904.09237).
|
||||
|
||||
We implement this as an extension to our [Adam optimizer implementation](adam.html).
|
||||
|
||||
@ -8,7 +8,7 @@ summary: >
|
||||
|
||||
# Noam Optimizer
|
||||
|
||||
This is the implementation of optimizer introduced in the paper
|
||||
This is the [PyTorch](https://pytorch.org) implementation of optimizer introduced in the paper
|
||||
[Attention Is All You Need](https://arxiv.org/abs/1706.03762).
|
||||
"""
|
||||
from typing import Dict
|
||||
|
||||
@ -11,7 +11,8 @@ This implementation is based on
|
||||
of the paper
|
||||
[On the Variance of the Adaptive Learning Rate and Beyond](https://arxiv.org/abs/1908.03265).
|
||||
|
||||
We have implemented it as an extension to [our AMSGrad implementation](amsgrad.html)
|
||||
We have implemented it in [PyTorch](https://pytorch.org)
|
||||
as an extension to [our AMSGrad implementation](amsgrad.html)
|
||||
thus requiring only the modifications to be implemented.
|
||||
|
||||
Adam optimizer sometimes converges to a bad local optima during the initial stages of the training;
|
||||
|
||||
@ -6,7 +6,7 @@ summary: A simple PyTorch implementation/tutorial of Recurrent Highway Networks.
|
||||
|
||||
# Recurrent Highway Networks
|
||||
|
||||
This is an implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
|
||||
This is a [PyTorch](https://pytorch.org) implementation of [Recurrent Highway Networks](https://arxiv.org/abs/1607.03474).
|
||||
"""
|
||||
from typing import Optional
|
||||
|
||||
|
||||
@ -11,7 +11,7 @@ summary: >
|
||||
|
||||
# Deep Q Networks (DQN)
|
||||
|
||||
This is an implementation of paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of paper
|
||||
[Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602)
|
||||
along with [Dueling Network](model.html), [Prioritized Replay](replay_buffer.html)
|
||||
and Double Q Network.
|
||||
|
||||
@ -7,7 +7,8 @@ summary: >
|
||||
|
||||
# Proximal Policy Optimization (PPO)
|
||||
|
||||
This is a an implementation of [Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).
|
||||
This is a [PyTorch](https://pytorch.org) implementation of
|
||||
[Proximal Policy Optimization - PPO](https://arxiv.org/abs/1707.06347).
|
||||
|
||||
You can find an experiment that uses it [here](experiment.html).
|
||||
The experiment uses [Generalized Advantage Estimation](gae.html).
|
||||
|
||||
@ -6,7 +6,8 @@ summary: A PyTorch implementation/tutorial of Generalized Advantage Estimation (
|
||||
|
||||
# Generalized Advantage Estimation (GAE)
|
||||
|
||||
This is an implementation of paper [Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
|
||||
This is a [PyTorch](https://pytorch.org) implementation of paper
|
||||
[Generalized Advantage Estimation](https://arxiv.org/abs/1506.02438).
|
||||
"""
|
||||
|
||||
import numpy as np
|
||||
|
||||
@ -8,7 +8,7 @@ summary: >
|
||||
|
||||
# Sketch RNN
|
||||
|
||||
This is an annotated implementation of the paper
|
||||
This is an annotated [PyTorch](https://pytorch.org) implementation of the paper
|
||||
[A Neural Representation of Sketch Drawings](https://arxiv.org/abs/1704.03477).
|
||||
|
||||
Sketch RNN is a sequence-to-sequence variational auto-encoder.
|
||||
|
||||
@ -6,6 +6,9 @@ summary: Documented reusable implementation of the position wise feedforward net
|
||||
|
||||
# Position-wise Feed-Forward Network (FFN)
|
||||
|
||||
This is a [PyTorch](https://pytorch.org) implementation
|
||||
of position-wise feedforward network used in transformer.
|
||||
|
||||
FFN consists of two fully connected layers.
|
||||
Number of dimensions in the hidden layer $d_{ff}$, is generally set to around
|
||||
four times that of the token embedding $d_{model}$.
|
||||
|
||||
@ -7,10 +7,10 @@ summary: >
|
||||
|
||||
# Feedback Transformer
|
||||
|
||||
This is an implementation of the paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of the paper
|
||||
[Accessing Higher-level Representations in Sequential Transformers with Feedback Memory](https://arxiv.org/abs/2002.09402).
|
||||
|
||||
Normal transformers process tokens in parallel and each transformer layer pays attention
|
||||
Normal transformers process tokens in parallel. Each transformer layer pays attention
|
||||
to the outputs of the previous layer.
|
||||
Feedback transformer pays attention to the output of all layers in previous steps.
|
||||
So this adds recurrence and we need to process token-by-token.
|
||||
|
||||
@ -7,8 +7,9 @@ summary: >
|
||||
|
||||
# GPT
|
||||
|
||||
This is an tutorial of
|
||||
[OpenAI GPT architecture](https://openai.com/blog/better-language-models/).
|
||||
This is an tutorial/implementation of
|
||||
[OpenAI GPT architecture](https://openai.com/blog/better-language-models/)
|
||||
in [PyTorch](https://pytorch.org).
|
||||
We got a bunch of implementation details from
|
||||
[minGPT](https://github.com/karpathy/minGPT)
|
||||
by [@karpathy](https://twitter.com/karpathy).
|
||||
|
||||
@ -11,7 +11,7 @@ summary: >
|
||||
|
||||
# k-Nearest Neighbor Language Models
|
||||
|
||||
This is an implementation of the paper
|
||||
This is a [PyTorch](https://pytorch.org) implementation of the paper
|
||||
[Generalization through Memorization: Nearest Neighbor Language Models](https://arxiv.org/abs/1911.00172).
|
||||
It uses k-nearest neighbors to improve perplexity of autoregressive transformer models.
|
||||
|
||||
|
||||
@ -9,7 +9,8 @@ summary: >
|
||||
# Relative Multi-Headed Attention
|
||||
|
||||
This is an implementation of
|
||||
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860).
|
||||
[Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context](https://arxiv.org/abs/1901.02860)
|
||||
in [PyTorch](https://pytorch.org).
|
||||
|
||||
Transformer has a limited attention span,
|
||||
equal to the length of the sequence trained in parallel.
|
||||
|
||||
@ -7,7 +7,7 @@ summary: >
|
||||
|
||||
# Switch Transformer
|
||||
|
||||
This is a miniature implementation of the paper
|
||||
This is a miniature [PyTorch](https://pytorch.org) implementation of the paper
|
||||
[Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity](https://arxiv.org/abs/2101.03961).
|
||||
Our implementation only has a few million parameters and doesn't do model parallel distributed training.
|
||||
It does single GPU training but we implement the concept of switching as described in the paper.
|
||||
|
||||
Reference in New Issue
Block a user