This commit is contained in:
Varuna Jayasiri
2021-01-03 14:28:07 +05:30
parent 724ff555b3
commit a962cc320e

View File

@ -41,9 +41,9 @@ $z_h$ is usually a linear transformation of the output of the smaller recurrent
### Weight scaling instead of computing ### Weight scaling instead of computing
Large recurrent networks have large dynamically computed parameters. Large recurrent networks have large dynamically computed parameters.
Since these are calculated using a linear transformation of the These are calculated using a linear transformation of feature vector $z$.
and this requires even large weight tensor. And this transformation requires an even large weight tensor.
That is when $\color{cyan}{W_h}$ has shape $N_h \times N_h$, That is, when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
$W_{hz}$ will be $N_h \times N_h \times N_z$. $W_{hz}$ will be $N_h \times N_h \times N_z$.
To overcome this, we compute the weight parameters of the recurrent network by To overcome this, we compute the weight parameters of the recurrent network by