diff --git a/docs/recurrent_highway_networks/index.html b/docs/recurrent_highway_networks/index.html index a76784eb..19c22847 100644 --- a/docs/recurrent_highway_networks/index.html +++ b/docs/recurrent_highway_networks/index.html @@ -109,10 +109,10 @@ c_d^t &= \sigma(lin_{cs}^d(s_d^t))

$\odot$ stands for element-wise multiplication.

Here we have made a couple of changes to notations from the paper. -To avoid confusion with time, the gate is represented with $g$, +To avoid confusion with time, gate is represented with $g$, which was $t$ in the paper. To avoid confusion with multiple layers we use $d$ for depth and $D$ for -total depth instead of $l$ and $L$ from paper.

+total depth instead of $l$ and $L$ from the paper.

We have also replaced the weight matrices and bias vectors from the equations with linear transforms, because that’s how the implementation is going to look like.

We implement weight tying, as described in paper, $c_d^t = 1 - g_d^t$.

@@ -127,7 +127,7 @@ linear transforms, because that’s how the implementation is going to look #

input_size is the feature length of the input and hidden_size is -feature length of the cell. +the feature length of the cell. depth is $D$.

diff --git a/docs/sitemap.xml b/docs/sitemap.xml index c8d5eea3..0eee6242 100644 --- a/docs/sitemap.xml +++ b/docs/sitemap.xml @@ -491,7 +491,7 @@ https://nn.labml.ai/recurrent_highway_networks/index.html - 2021-02-08T16:30:00+00:00 + 2021-02-11T16:30:00+00:00 1.00