mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 08:41:23 +08:00
typos in readmes
This commit is contained in:
@ -81,7 +81,7 @@ equal to the length of the sequence trained in parallel.
|
||||
All these positions have a fixed positional encoding.
|
||||
Transformer XL increases this attention span by letting
|
||||
each of the positions pay attention to precalculated past embeddings.
|
||||
For instance if the context length is $l$ it will keep the embeddings of
|
||||
For instance if the context length is $l$, it will keep the embeddings of
|
||||
all layers for previous batch of length $l$ and feed them to current step.
|
||||
If we use fixed-positional encodings these pre-calculated embeddings will have
|
||||
the same positions as the current context.
|
||||
|
Reference in New Issue
Block a user