mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-14 17:41:37 +08:00
fix
This commit is contained in:
@ -41,9 +41,9 @@ $z_h$ is usually a linear transformation of the output of the smaller recurrent
|
||||
### Weight scaling instead of computing
|
||||
|
||||
Large recurrent networks have large dynamically computed parameters.
|
||||
Since these are calculated using a linear transformation of the
|
||||
and this requires even large weight tensor.
|
||||
That is when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
|
||||
These are calculated using a linear transformation of feature vector $z$.
|
||||
And this transformation requires an even large weight tensor.
|
||||
That is, when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
|
||||
$W_{hz}$ will be $N_h \times N_h \times N_z$.
|
||||
|
||||
To overcome this, we compute the weight parameters of the recurrent network by
|
||||
|
Reference in New Issue
Block a user