Files
2021-06-06 15:12:11 +05:30
..
2021-06-06 15:12:11 +05:30
2021-06-04 17:38:55 +05:30
2021-06-02 21:36:47 +05:30

An Attention Free Transformer

This is a PyTorch implementation of the paper An Attention Free Transformer.

This paper replaces the self-attention layer with a new efficient operation, that has memory complexity of O(Td), where T is the sequence length and d is the dimensionality of embeddings.

The paper introduces AFT along with AFT-local and AFT-conv. Here we have implemented AFT-local which pays attention to closeby tokens in an autoregressive model.

View Run