Transformers

This module contains PyTorch implementations and explanations of original transformer from paper Attention Is All You Need, and derivatives and enhancements of it.

GPT Architecture

This is an implementation of GPT-2 architecture.

kNN-LM

This is an implementation of the paper Generalization through Memorization: Nearest Neighbor Language Models.

Feedback Transformer

This is an implementation of the paper Accessing Higher-level Representations in Sequential Transformers with Feedback Memory.

Switch Transformer

This is a miniature implementation of the paper Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity. Our implementation only has a few million parameters and doesn’t do model parallel distributed training. It does single GPU training but we implement the concept of switching as described in the paper.

43from .configs import TransformerConfigs
44from .models import TransformerLayer, Encoder, Decoder, Generator, EncoderDecoder
45from .mha import MultiHeadAttention
46from .relative_mha import RelativeMultiHeadAttention