mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 08:41:23 +08:00
typo
This commit is contained in:
@ -90,7 +90,7 @@ Instead it keeps weighted sum of the output of all layers.
|
||||
This reduces the memory used for caching during prediction.
|
||||
The first half of this file implements this.</p>
|
||||
<p>The updated feedback transformer shares weights $W^l_k$ and $W^l_v$ used
|
||||
to calculate keys and values for among the layers.
|
||||
to calculate keys and values among the layers.
|
||||
We then calculate the keys and values for each step only once and keep
|
||||
them cached.
|
||||
The <a href="#shared_kv">second half</a> of this file implements this.
|
||||
|
@ -28,7 +28,7 @@ This reduces the memory used for caching during prediction.
|
||||
The first half of this file implements this.
|
||||
|
||||
The updated feedback transformer shares weights $W^l_k$ and $W^l_v$ used
|
||||
to calculate keys and values for among the layers.
|
||||
to calculate keys and values among the layers.
|
||||
We then calculate the keys and values for each step only once and keep
|
||||
them cached.
|
||||
The [second half](#shared_kv) of this file implements this.
|
||||
|
Reference in New Issue
Block a user