typos in readmes

This commit is contained in:
Varuna Jayasiri
2021-02-19 09:23:27 +05:30
parent 3b1e75da62
commit ccb9ee2e4c
8 changed files with 142 additions and 138 deletions

View File

@ -81,7 +81,7 @@ equal to the length of the sequence trained in parallel.
All these positions have a fixed positional encoding.
Transformer XL increases this attention span by letting
each of the positions pay attention to precalculated past embeddings.
For instance if the context length is $l$ it will keep the embeddings of
For instance if the context length is $l$, it will keep the embeddings of
all layers for previous batch of length $l$ and feed them to current step.
If we use fixed-positional encodings these pre-calculated embeddings will have
the same positions as the current context.