This commit is contained in:
Varuna Jayasiri
2021-01-03 14:28:07 +05:30
parent 724ff555b3
commit a962cc320e

View File

@ -41,9 +41,9 @@ $z_h$ is usually a linear transformation of the output of the smaller recurrent
### Weight scaling instead of computing
Large recurrent networks have large dynamically computed parameters.
Since these are calculated using a linear transformation of the
and this requires even large weight tensor.
That is when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
These are calculated using a linear transformation of feature vector $z$.
And this transformation requires an even large weight tensor.
That is, when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
$W_{hz}$ will be $N_h \times N_h \times N_z$.
To overcome this, we compute the weight parameters of the recurrent network by