learning/annotated_deep_learning_paper_implementations

mirror of https://github.com/labmlai/annotated_deep_learning_paper_implementations.git synced 2025-10-30 10:18:50 +08:00

Files

History

Varuna Jayasiri ebce404402 Masked Language Model (#56 )

2021-06-06 15:12:11 +05:30

..

__init__.py

Masked Language Model (#56 )

2021-06-06 15:12:11 +05:30

experiment.py

AFT fix (#55 )

2021-06-04 17:38:55 +05:30

readme.md

AFT (#54 )

2021-06-02 21:36:47 +05:30

readme.md

An Attention Free Transformer

This is a PyTorch implementation of the paper An Attention Free Transformer.

This paper replaces the self-attention layer with a new efficient operation, that has memory complexity of O(Td), where T is the sequence length and d is the dimensionality of embeddings.

The paper introduces AFT along with AFT-local and AFT-conv. Here we have implemented AFT-local which pays attention to closeby tokens in an autoregressive model.