mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-15 02:07:56 +08:00
fix
This commit is contained in:
@ -41,9 +41,9 @@ $z_h$ is usually a linear transformation of the output of the smaller recurrent
|
|||||||
### Weight scaling instead of computing
|
### Weight scaling instead of computing
|
||||||
|
|
||||||
Large recurrent networks have large dynamically computed parameters.
|
Large recurrent networks have large dynamically computed parameters.
|
||||||
Since these are calculated using a linear transformation of the
|
These are calculated using a linear transformation of feature vector $z$.
|
||||||
and this requires even large weight tensor.
|
And this transformation requires an even large weight tensor.
|
||||||
That is when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
|
That is, when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
|
||||||
$W_{hz}$ will be $N_h \times N_h \times N_z$.
|
$W_{hz}$ will be $N_h \times N_h \times N_z$.
|
||||||
|
|
||||||
To overcome this, we compute the weight parameters of the recurrent network by
|
To overcome this, we compute the weight parameters of the recurrent network by
|
||||||
|
Reference in New Issue
Block a user