fix

2025-08-15 02:07:56 +08:00 · 2021-01-03 14:28:07 +05:30
parent 724ff555b3
commit a962cc320e
1 changed files with 3 additions and 3 deletions
--- a/labml_nn/hypernetworks/hyper_lstm.py
+++ b/labml_nn/hypernetworks/hyper_lstm.py
@ -41,9 +41,9 @@ $z_h$ is usually a linear transformation of the output of the smaller recurrent
 ### Weight scaling instead of computing
 Large recurrent networks have large dynamically computed parameters.
-Since these are calculated using a linear transformation of the
+These are calculated using a linear transformation of feature vector $z$.
-and this requires even large weight tensor.
+And this transformation requires an even large weight tensor.
-That is when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
+That is, when $\color{cyan}{W_h}$ has shape $N_h \times N_h$,
 $W_{hz}$ will be $N_h \times N_h \times N_z$.
 To overcome this, we compute the weight parameters of the recurrent network by