Files
Varuna Jayasiri a7a7a3bdb7 RETRO (#110)
2022-03-12 15:44:35 +05:30

715 lines
226 KiB
HTML
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
<meta name="description" content="A PyTorch implementation/tutorial of HyperLSTM introduced in paper HyperNetworks."/>
<meta name="twitter:card" content="summary"/>
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta name="twitter:title" content="HyperNetworks - HyperLSTM"/>
<meta name="twitter:description" content="A PyTorch implementation/tutorial of HyperLSTM introduced in paper HyperNetworks."/>
<meta name="twitter:site" content="@labmlai"/>
<meta name="twitter:creator" content="@labmlai"/>
<meta property="og:url" content="https://nn.labml.ai/hypernetworks/hyper_lstm.html"/>
<meta property="og:title" content="HyperNetworks - HyperLSTM"/>
<meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
<meta property="og:site_name" content="LabML Neural Networks"/>
<meta property="og:type" content="object"/>
<meta property="og:title" content="HyperNetworks - HyperLSTM"/>
<meta property="og:description" content="A PyTorch implementation/tutorial of HyperLSTM introduced in paper HyperNetworks."/>
<title>HyperNetworks - HyperLSTM</title>
<link rel="shortcut icon" href="/icon.png"/>
<link rel="stylesheet" href="../pylit.css?v=1">
<link rel="canonical" href="https://nn.labml.ai/hypernetworks/hyper_lstm.html"/>
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous">
<!-- Global site tag (gtag.js) - Google Analytics -->
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag('js', new Date());
gtag('config', 'G-4V3HC8HBLH');
</script>
</head>
<body>
<div id='container'>
<div id="background"></div>
<div class='section'>
<div class='docs'>
<p>
<a class="parent" href="/">home</a>
<a class="parent" href="index.html">hypernetworks</a>
</p>
<p>
<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations/tree/master/labml_nn/hypernetworks/hyper_lstm.py">
<img alt="Github"
src="https://img.shields.io/github/stars/labmlai/annotated_deep_learning_paper_implementations?style=social"
style="max-width:100%;"/></a>
<a href="https://twitter.com/labmlai"
rel="nofollow">
<img alt="Twitter"
src="https://img.shields.io/twitter/follow/labmlai?style=social"
style="max-width:100%;"/></a>
</p>
</div>
</div>
<div class='section' id='section-0'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-0'>#</a>
</div>
<h1>HyperNetworks - HyperLSTM</h1>
<p>We have implemented HyperLSTM introduced in paper <a href="https://papers.labml.ai/paper/1609.09106">HyperNetworks</a>, with annotations using <a href="https://pytorch.org">PyTorch</a>. <a href="https://blog.otoro.net/2016/09/28/hyper-networks/">This blog post</a> by David Ha gives a good explanation of HyperNetworks.</p>
<p>We have an experiment that trains a HyperLSTM to predict text on Shakespeare dataset. Here&#x27;s the link to code: <a href="experiment.html"><code class="highlight"><span></span><span class="n">experiment</span><span class="o">.</span><span class="n">py</span></code>
</a></p>
<p><a href="https://colab.research.google.com/github/labmlai/annotated_deep_learning_paper_implementations/blob/master/labml_nn/hypernetworks/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg"></a> <a href="https://app.labml.ai/run/9e7f39e047e811ebbaff2b26e3148b3d"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen"></a></p>
<p>HyperNetworks use a smaller network to generate weights of a larger network. There are two variants: static hyper-networks and dynamic hyper-networks. Static HyperNetworks have smaller networks that generate weights (kernels) of a convolutional network. Dynamic HyperNetworks generate parameters of a recurrent neural network for each step. This is an implementation of the latter.</p>
<h2>Dynamic HyperNetworks</h2>
<p>In a RNN the parameters stay constant for each step. Dynamic HyperNetworks generate different parameters for each step. HyperLSTM has the structure of a LSTM but the parameters of each step are changed by a smaller LSTM network.</p>
<p>In the basic form, a Dynamic HyperNetwork has a smaller recurrent network that generates a feature vector corresponding to each parameter tensor of the larger recurrent network. Let&#x27;s say the larger network has some parameter <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> the smaller network generates a feature vector <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> and we dynamically compute <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> as a linear transformation of <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>. For instance <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen"></span><span class="mord coloredeq eqbj" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbu" style=""><span class="mord mathnormal mtight" style="margin-right:0.04398em">z</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose"></span></span></span></span> where <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbj" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbu" style=""><span class="mord mathnormal mtight" style="margin-right:0.04398em">z</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> is a 3-d tensor parameter and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mopen"></span><span class="mord">.</span><span class="mclose"></span></span></span></span> is a tensor-vector multiplication. <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> is usually a linear transformation of the output of the smaller recurrent network.</p>
<h3>Weight scaling instead of computing</h3>
<p>Large recurrent networks have large dynamically computed parameters. These are calculated using linear transformation of feature vector <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span></span></span></span>. And this transformation requires an even larger weight tensor. That is, when <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> has shape <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqz" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin" style="">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>, <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbj" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbu" style=""><span class="mord mathnormal mtight" style="margin-right:0.04398em">z</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> will be <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqz" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin" style="">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight coloredeq eqbu" style=""><span class="mord mathnormal mtight" style="margin-right:0.04398em">z</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span>.</p>
<p>To overcome this, we compute the weight parameters of the recurrent network by dynamically scaling each row of a matrix of same size.</p>
<span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:8.165125000000003em;vertical-align:-3.8325625000000016em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:4.332562500000001em;"><span style="top:-8.175125000000001em;"><span class="pstrut" style="height:4.6825625em;"></span><span class="mord"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="mopen">(</span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose">)</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord coloredeq eqbj" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbu" style=""><span class="mord mathnormal mtight" style="margin-right:0.04398em">z</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-6.675125em;"><span class="pstrut" style="height:4.6825625em;"></span><span class="mord"></span></span><span style="top:-3.332562499999999em;"><span class="pstrut" style="height:4.6825625em;"></span><span class="mord"><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="minner"><span class="mopen"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.65003em;"><span style="top:-1.7110100000000004em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.85801em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span style="height:1.2160200000000003em;width:0.875em;"><svg height="1.2160200000000003em" preserveaspectratio="xMinYMin" style="width:0.875em" viewbox="0 0 875 1216" width="0.875em" xmlns="http://www.w3.org/2000/svg"><path d="M291 0 H417 V1216 H291z M291 0 H417 V1216 H291z"></path></svg></span></span><span style="top:-4.71105em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.15003em;"><span></span></span></span></span></span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.6825625000000004em;"><span style="top:-4.8425625000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose">)</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">0</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2501em;"><span></span></span></span></span></span></span></span></span><span style="top:-3.6425625em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.30110799999999993em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose">)</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.31731428571428577em;"><span style="top:-2.357em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight">1</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.143em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.2501em;"><span></span></span></span></span></span></span></span></span><span style="top:-2.4425624999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">...</span></span></span><span style="top:-1.2425624999999996em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.32833099999999993em;"><span style="top:-2.55em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3487714285714287em;margin-left:-0.10903em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15122857142857138em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.25586em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose">)</span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999985em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3567071428571427em;margin-right:0.07142857142857144em;"><span class="pstrut" style="height:2.5em;"></span><span class="sizing reset-size3 size1 mtight"><span class="mord mtight"><span class="mord mtight"><span class="mord mathnormal mtight" style="margin-right:0.10903em;">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.3448em;"><span style="top:-2.3448em;margin-left:-0.10903em;margin-right:0.1em;"><span class="pstrut" style="height:2.69444em;"></span><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.34963999999999995em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.39303571428571427em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.425125em;"><span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.1825625000000004em;"><span></span></span></span></span></span></span></span><span class="mclose"><span class="delimsizing mult"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.65003em;"><span style="top:-1.7110100000000004em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span class="delimsizinginner delim-size4"><span></span></span></span><span style="top:-2.85801em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span style="height:1.2160200000000003em;width:0.875em;"><svg height="1.2160200000000003em" preserveaspectratio="xMinYMin" style="width:0.875em" viewbox="0 0 875 1216" width="0.875em" xmlns="http://www.w3.org/2000/svg"><path d="M457 0 H583 V1216 H457z M457 0 H583 V1216 H457z"></path></svg></span></span><span style="top:-4.71105em;"><span class="pstrut" style="height:3.2160200000000003em;"></span><span class="delimsizinginner delim-size4"><span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.15003em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:3.8325625000000016em;"><span></span></span></span></span></span></span></span></span></span></span></span><p>where <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbi" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> is a <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.83333em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqz" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin" style="">×</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.10903em">N</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.10903em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> parameter matrix.</p>
<p>We can further optimize this when we compute <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord" style="color:cyan"><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span></span></span>, as <span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="mclose" style="color:lightgreen;">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord coloredeq eqbi" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="mclose" style="color:lightgreen;">)</span></span></span></span></span> where <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.66666em;vertical-align:-0.08333em;"></span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span></span></span></span> stands for element-wise multiplication.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">73</span><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">Optional</span><span class="p">,</span> <span class="n">Tuple</span>
<span class="lineno">74</span>
<span class="lineno">75</span><span class="kn">import</span> <span class="nn">torch</span>
<span class="lineno">76</span><span class="kn">from</span> <span class="nn">torch</span> <span class="kn">import</span> <span class="n">nn</span>
<span class="lineno">77</span>
<span class="lineno">78</span><span class="kn">from</span> <span class="nn">labml_helpers.module</span> <span class="kn">import</span> <span class="n">Module</span>
<span class="lineno">79</span><span class="kn">from</span> <span class="nn">labml_nn.lstm</span> <span class="kn">import</span> <span class="n">LSTMCell</span></pre></div>
</div>
</div>
<div class='section' id='section-1'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-1'>#</a>
</div>
<h2>HyperLSTM Cell</h2>
<p>For HyperLSTM the smaller network and the larger network both have the LSTM structure. This is defined in Appendix A.2.2 in the paper.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">82</span><span class="k">class</span> <span class="nc">HyperLSTMCell</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-2'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-2'>#</a>
</div>
<p> <code class="highlight"><span></span><span class="n">input_size</span></code>
is the size of the input <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbm" style=""><span class="mord" style=""><span class="mord mathnormal" style="">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>, <code class="highlight"><span></span><span class="n">hidden_size</span></code>
is the size of the LSTM, and <code class="highlight"><span></span><span class="n">hyper_size</span></code>
is the size of the smaller LSTM that alters the weights of the larger outer LSTM. <code class="highlight"><span></span><span class="n">n_z</span></code>
is the size of the feature vectors used to alter the LSTM weights.</p>
<p>We use the output of the smaller LSTM to compute <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.9991079999999999em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span></span></span></span>, <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.0961079999999999em;vertical-align:-0.247em;"></span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span></span></span></span> and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span></span></span></span> using linear transformations. We calculate <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span></span></span></span>, <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.0991079999999998em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqr" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span></span></span></span>, and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqp" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span></span></span></span> from these, using linear transformations again. These are then used to scale the rows of weight and bias tensors of the main LSTM.</p>
<p>📝 Since the computation of <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span></span></span></span> and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span></span></span></span> are two sequential linear transformations these can be combined into a single linear transformation. However we&#x27;ve implemented this separately so that it matches with the description in the paper.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">90</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">n_z</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-3'>
<div class='docs'>
<div class='section-link'>
<a href='#section-3'>#</a>
</div>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">108</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
</div>
</div>
<div class='section' id='section-4'>
<div class='docs'>
<div class='section-link'>
<a href='#section-4'>#</a>
</div>
<p>The input to the hyperLSTM is <span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">x</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.22222em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord coloredeq eqbh" style=""><span class="mord" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight" style="">t</span><span class="mbin mtight" style=""></span><span class="mord mtight" style="">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord coloredeq eqbm" style=""><span class="mord" style=""><span class="mord mathnormal" style="">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span></span></span></span></span> where <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbm" style=""><span class="mord" style=""><span class="mord mathnormal" style="">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> is the input and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.902771em;vertical-align:-0.208331em;"></span><span class="mord coloredeq eqbh" style=""><span class="mord" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight" style="">t</span><span class="mbin mtight" style=""></span><span class="mord mtight" style="">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span></span></span></span> is the output of the outer LSTM at previous step. So the input size is <code class="highlight"><span></span><span class="n">hidden_size</span> <span class="o">+</span> <span class="n">input_size</span></code>
.</p>
<p>The output of hyperLSTM is <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1078799999999998em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbf" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="">c</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span>. </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">121</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyper</span> <span class="o">=</span> <span class="n">LSTMCell</span><span class="p">(</span><span class="n">hidden_size</span> <span class="o">+</span> <span class="n">input_size</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">,</span> <span class="n">layer_norm</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-5'>
<div class='docs'>
<div class='section-link'>
<a href='#section-5'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.049108em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> 🤔 In the paper it was specified as <span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.049108em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight" style="color:red"><span class="mord mathnormal mtight" style="">t</span><span class="mbin mtight" style=""></span><span class="mord mtight" style="">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> I feel that it&#x27;s a typo. </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">127</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_h</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">hyper_size</span><span class="p">,</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">n_z</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-6'>
<div class='docs'>
<div class='section-link'>
<a href='#section-6'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.146108em;vertical-align:-0.247em;"></span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2078799999999998em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">129</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_x</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">hyper_size</span><span class="p">,</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">n_z</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-7'>
<div class='docs'>
<div class='section-link'>
<a href='#section-7'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">131</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_b</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">hyper_size</span><span class="p">,</span> <span class="mi">4</span> <span class="o">*</span> <span class="n">n_z</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-8'>
<div class='docs'>
<div class='section-link'>
<a href='#section-8'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">134</span> <span class="n">d_h</span> <span class="o">=</span> <span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">n_z</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
<span class="lineno">135</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_h</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">(</span><span class="n">d_h</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-9'>
<div class='docs'>
<div class='section-link'>
<a href='#section-9'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1491079999999998em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqr" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="mord mathnormal mtight">x</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">137</span> <span class="n">d_x</span> <span class="o">=</span> <span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">n_z</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="n">bias</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
<span class="lineno">138</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_x</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">(</span><span class="n">d_x</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-10'>
<div class='docs'>
<div class='section-link'>
<a href='#section-10'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqp" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="mord mathnormal mtight">b</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">140</span> <span class="n">d_b</span> <span class="o">=</span> <span class="p">[</span><span class="n">nn</span><span class="o">.</span><span class="n">Linear</span><span class="p">(</span><span class="n">n_z</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)]</span>
<span class="lineno">141</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_b</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">(</span><span class="n">d_b</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-11'>
<div class='docs'>
<div class='section-link'>
<a href='#section-11'>#</a>
</div>
<p>The weight matrices <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqba" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999999em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.1809080000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">144</span> <span class="bp">self</span><span class="o">.</span><span class="n">w_h</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ParameterList</span><span class="p">([</span><span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">))</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)])</span></pre></div>
</div>
</div>
<div class='section' id='section-12'>
<div class='docs'>
<div class='section-link'>
<a href='#section-12'>#</a>
</div>
<p>The weight matrices <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.0961079999999999em;vertical-align:-0.247em;"></span><span class="mord"><span class="mord mathnormal" style="margin-right:0.13889em;">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8491079999999999em;"><span style="top:-2.4530000000000003em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span><span style="top:-3.063em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">146</span> <span class="bp">self</span><span class="o">.</span><span class="n">w_x</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ParameterList</span><span class="p">([</span><span class="n">nn</span><span class="o">.</span><span class="n">Parameter</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">zeros</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">,</span> <span class="n">input_size</span><span class="p">))</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)])</span></pre></div>
</div>
</div>
<div class='section' id='section-13'>
<div class='docs'>
<div class='section-link'>
<a href='#section-13'>#</a>
</div>
<p>Layer normalization </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">149</span> <span class="bp">self</span><span class="o">.</span><span class="n">layer_norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">)])</span>
<span class="lineno">150</span> <span class="bp">self</span><span class="o">.</span><span class="n">layer_norm_c</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-14'>
<div class='docs'>
<div class='section-link'>
<a href='#section-14'>#</a>
</div>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">152</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">153</span> <span class="n">h</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">c</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">154</span> <span class="n">h_hat</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">c_hat</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-15'>
<div class='docs'>
<div class='section-link'>
<a href='#section-15'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">x</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.22222em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:2.40003em;vertical-align:-0.95003em;"></span><span class="minner"><span class="mopen delimcenter" style="top:0em;"><span class="delimsizing size3">(</span></span><span class="mord"><span class="mtable"><span class="col-align-c"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:1.45em;"><span style="top:-3.61em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord coloredeq eqbh" style=""><span class="mord" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight" style="">t</span><span class="mbin mtight" style=""></span><span class="mord mtight" style="">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span></span></span><span style="top:-2.4099999999999997em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord coloredeq eqbm" style=""><span class="mord" style=""><span class="mord mathnormal" style="">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.9500000000000004em;"><span></span></span></span></span></span></span></span><span class="mclose delimcenter" style="top:0em;"><span class="delimsizing size3">)</span></span></span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">161</span> <span class="n">x_hat</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">cat</span><span class="p">((</span><span class="n">h</span><span class="p">,</span> <span class="n">x</span><span class="p">),</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-16'>
<div class='docs'>
<div class='section-link'>
<a href='#section-16'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1523199999999998em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqbf" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="">c</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2078799999999998em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord mathnormal">s</span><span class="mord mathnormal">t</span><span class="mord mathnormal">m</span><span class="mopen">(</span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">x</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.22222em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight"></span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord accent"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal">c</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight"></span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">163</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyper</span><span class="p">(</span><span class="n">x_hat</span><span class="p">,</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-17'>
<div class='docs'>
<div class='section-link'>
<a href='#section-17'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.049108em;vertical-align:-0.15em;"></span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">166</span> <span class="n">z_h</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_h</span><span class="p">(</span><span class="n">h_hat</span><span class="p">)</span><span class="o">.</span><span class="n">chunk</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-18'>
<div class='docs'>
<div class='section-link'>
<a href='#section-18'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.146108em;vertical-align:-0.247em;"></span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2078799999999998em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">168</span> <span class="n">z_x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_x</span><span class="p">(</span><span class="n">h_hat</span><span class="p">)</span><span class="o">.</span><span class="n">chunk</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-19'>
<div class='docs'>
<div class='section-link'>
<a href='#section-19'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqbc" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbg" style=""><span class="mord" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">170</span> <span class="n">z_b</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">z_b</span><span class="p">(</span><span class="n">h_hat</span><span class="p">)</span><span class="o">.</span><span class="n">chunk</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="n">dim</span><span class="o">=-</span><span class="mi">1</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-20'>
<div class='docs'>
<div class='section-link'>
<a href='#section-20'>#</a>
</div>
<p>We calculate <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.65952em;vertical-align:0em;"></span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span></span></span></span>, <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqbp" style=""><span class="mord mathnormal" style="margin-right:0.10764em">f</span></span></span></span></span>, <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.625em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqbq" style=""><span class="mord mathnormal" style="margin-right:0.03588em">g</span></span></span></span></span> and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.43056em;vertical-align:0em;"></span><span class="mord coloredeq eqbt" style=""><span class="mord mathnormal" style="">o</span></span></span></span></span> in a loop </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">173</span> <span class="n">ifgo</span> <span class="o">=</span> <span class="p">[]</span>
<span class="lineno">174</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">4</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-21'>
<div class='docs'>
<div class='section-link'>
<a href='#section-21'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord coloredeq eqq" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbd" style=""><span class="mord" style=""><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="msupsub"><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999999em;"><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">176</span> <span class="n">d_h</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_h</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">z_h</span><span class="p">[</span><span class="n">i</span><span class="p">])</span></pre></div>
</div>
</div>
<div class='section' id='section-22'>
<div class='docs'>
<div class='section-link'>
<a href='#section-22'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1491079999999998em;vertical-align:-0.25em;"></span><span class="mord coloredeq eqr" style=""><span class="mord" style=""><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen" style="">(</span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mclose" style="">)</span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1.2683239999999998em;vertical-align:-0.3013079999999999em;"></span><span class="mord mathnormal" style="margin-right:0.01968em;">l</span><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mord"><span class="mord mathnormal">n</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbo" style=""><span class="mord mathnormal mtight" style="">d</span></span><span class="mord mathnormal mtight">x</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord coloredeq eqbe" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">178</span> <span class="n">d_x</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_x</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">z_x</span><span class="p">[</span><span class="n">i</span><span class="p">])</span></pre></div>
</div>
</div>
<div class='section' id='section-23'>
<div class='docs'>
<div class='section-link'>
<a href='#section-23'>#</a>
</div>
<span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:4.881048em;vertical-align:-2.190524em;"></span><span class="mord"><span class="mtable"><span class="col-align-r"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.690524em;"><span style="top:-4.723508000000001em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqbp" style=""><span class="mord mathnormal" style="margin-right:0.10764em">f</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqbq" style=""><span class="mord mathnormal" style="margin-right:0.03588em">g</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqbt" style=""><span class="mord mathnormal" style="">o</span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mord mathnormal">L</span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mopen">(</span></span></span><span style="top:-3.0964920000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">+</span></span></span><span style="top:-1.4694760000000002em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord">+</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.190524em;"><span></span></span></span></span></span><span class="col-align-l"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:2.690524em;"><span style="top:-4.723508000000001em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mord" style="color:lightgreen;"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style="color:lightgreen"><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style="color:lightgreen"><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord coloredeq eqbn" style=""><span class="mord" style=""><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose" style="color:lightgreen;">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord coloredeq eqba" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mord coloredeq eqbh" style=""><span class="mord" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mathnormal mtight" style="">t</span><span class="mbin mtight" style=""></span><span class="mord mtight" style="">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span></span><span class="mclose" style="color:lightgreen;">)</span></span></span><span style="top:-3.0964920000000005em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mord" style="color:lightgreen;"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.8991079999999998em;"><span style="top:-2.4530000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="">x</span></span></span><span style="top:-3.1130000000000004em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style="color:lightgreen"><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.247em;"><span></span></span></span></span></span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord" style="color:lightgreen;"><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.151392em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style="color:lightgreen"><span class="mord mathnormal mtight" style="">x</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose" style="color:lightgreen;">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mopen" style="color:lightgreen;">(</span><span class="mord coloredeq eqba" style=""><span class="mord" style=""><span class="mord mathnormal" style="margin-right:0.13889em">W</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-left:-0.13889em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight coloredeq eqbr" style=""><span class="mord mathnormal mtight" style="">h</span></span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mtight" style=""><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight" style="">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span></span><span class="mord coloredeq eqbm" style=""><span class="mord" style=""><span class="mord mathnormal" style="">x</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight" style=""><span class="mord mathnormal mtight" style="">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span><span class="mclose" style="color:lightgreen;">)</span></span></span><span style="top:-1.4694760000000002em;"><span class="pstrut" style="height:3em;"></span><span class="mord"><span class="mord"></span><span class="mord"><span class="mord coloredeq eqbo" style=""><span class="mord mathnormal" style="">d</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.9670159999999998em;"><span style="top:-2.3986920000000005em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span><span style="top:-3.180908em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mtight coloredeq eqbs" style=""><span class="mord mathnormal mtight" style="">i</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbp" style=""><span class="mord mathnormal mtight" style="margin-right:0.10764em">f</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbq" style=""><span class="mord mathnormal mtight" style="margin-right:0.03588em">g</span></span><span class="mpunct mtight">,</span><span class="mord mtight coloredeq eqbt" style=""><span class="mord mathnormal mtight" style="">o</span></span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.3013079999999999em;"><span></span></span></span></span></span></span><span class="mopen">(</span><span class="mord"><span class="mord coloredeq eqbu" style=""><span class="mord mathnormal" style="margin-right:0.04398em">z</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.33610799999999996em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">b</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">))</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:2.190524em;"><span></span></span></span></span></span></span></span></span></span></span></span><p> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">185</span> <span class="n">y</span> <span class="o">=</span> <span class="n">d_h</span> <span class="o">*</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;ij,bj-&gt;bi&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">w_h</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">h</span><span class="p">)</span> <span class="o">+</span> \
<span class="lineno">186</span> <span class="n">d_x</span> <span class="o">*</span> <span class="n">torch</span><span class="o">.</span><span class="n">einsum</span><span class="p">(</span><span class="s1">&#39;ij,bj-&gt;bi&#39;</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">w_x</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">x</span><span class="p">)</span> <span class="o">+</span> \
<span class="lineno">187</span> <span class="bp">self</span><span class="o">.</span><span class="n">d_b</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">z_b</span><span class="p">[</span><span class="n">i</span><span class="p">])</span>
<span class="lineno">188</span>
<span class="lineno">189</span> <span class="n">ifgo</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">layer_norm</span><span class="p">[</span><span class="n">i</span><span class="p">](</span><span class="n">y</span><span class="p">))</span></pre></div>
</div>
</div>
<div class='section' id='section-24'>
<div class='docs'>
<div class='section-link'>
<a href='#section-24'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord"><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord coloredeq eqbp" style=""><span class="mord mathnormal" style="margin-right:0.10764em">f</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord coloredeq eqbq" style=""><span class="mord mathnormal" style="margin-right:0.03588em">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord"><span class="mord coloredeq eqbt" style=""><span class="mord mathnormal" style="">o</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">192</span> <span class="n">i</span><span class="p">,</span> <span class="n">f</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">o</span> <span class="o">=</span> <span class="n">ifgo</span></pre></div>
</div>
</div>
<div class='section' id='section-25'>
<div class='docs'>
<div class='section-link'>
<a href='#section-25'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.58056em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord coloredeq eqbp" style=""><span class="mord mathnormal" style="margin-right:0.10764em">f</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.301108em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mtight"><span class="mord mathnormal mtight">t</span><span class="mbin mtight"></span><span class="mord mtight">1</span></span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.208331em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2222222222222222em;"></span><span class="mbin">+</span><span class="mspace" style="margin-right:0.2222222222222222em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord coloredeq eqbs" style=""><span class="mord mathnormal" style="">i</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">tanh</span><span class="mopen">(</span><span class="mord"><span class="mord coloredeq eqbq" style=""><span class="mord mathnormal" style="margin-right:0.03588em">g</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">195</span> <span class="n">c_next</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="o">*</span> <span class="n">c</span> <span class="o">+</span> <span class="n">torch</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="o">*</span> <span class="n">torch</span><span class="o">.</span><span class="n">tanh</span><span class="p">(</span><span class="n">g</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-26'>
<div class='docs'>
<div class='section-link'>
<a href='#section-26'>#</a>
</div>
<p><span class="katex-display"><span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.84444em;vertical-align:-0.15em;"></span><span class="mord"><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mspace" style="margin-right:0.2777777777777778em;"></span><span class="mrel">=</span><span class="mspace" style="margin-right:0.2777777777777778em;"></span></span><span class="base"><span class="strut" style="height:1em;vertical-align:-0.25em;"></span><span class="mord mathnormal" style="margin-right:0.03588em;">σ</span><span class="mopen">(</span><span class="mord"><span class="mord coloredeq eqbt" style=""><span class="mord mathnormal" style="">o</span></span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">)</span><span class="mord coloredeq eqbk" style=""><span class="mord" style=""></span></span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mop">tanh</span><span class="mopen">(</span><span class="mord mathnormal">L</span><span class="mord mathnormal" style="margin-right:0.10903em;">N</span><span class="mopen">(</span><span class="mord"><span class="mord mathnormal">c</span><span class="msupsub"><span class="vlist-t vlist-t2"><span class="vlist-r"><span class="vlist" style="height:0.2805559999999999em;"><span style="top:-2.5500000000000003em;margin-left:0em;margin-right:0.05em;"><span class="pstrut" style="height:2.7em;"></span><span class="sizing reset-size6 size3 mtight"><span class="mord mathnormal mtight">t</span></span></span></span><span class="vlist-s"></span></span><span class="vlist-r"><span class="vlist" style="height:0.15em;"><span></span></span></span></span></span></span><span class="mclose">))</span></span></span></span></span> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">198</span> <span class="n">h_next</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">sigmoid</span><span class="p">(</span><span class="n">o</span><span class="p">)</span> <span class="o">*</span> <span class="n">torch</span><span class="o">.</span><span class="n">tanh</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">layer_norm_c</span><span class="p">(</span><span class="n">c_next</span><span class="p">))</span>
<span class="lineno">199</span>
<span class="lineno">200</span> <span class="k">return</span> <span class="n">h_next</span><span class="p">,</span> <span class="n">c_next</span><span class="p">,</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span></pre></div>
</div>
</div>
<div class='section' id='section-27'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-27'>#</a>
</div>
<h1>HyperLSTM module</h1>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">203</span><span class="k">class</span> <span class="nc">HyperLSTM</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-28'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-28'>#</a>
</div>
<p> Create a network of <code class="highlight"><span></span><span class="n">n_layers</span></code>
of HyperLSTM.</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">208</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">input_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">n_z</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-29'>
<div class='docs'>
<div class='section-link'>
<a href='#section-29'>#</a>
</div>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">213</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
</div>
</div>
<div class='section' id='section-30'>
<div class='docs'>
<div class='section-link'>
<a href='#section-30'>#</a>
</div>
<p>Store sizes to initialize state </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">216</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span> <span class="o">=</span> <span class="n">n_layers</span>
<span class="lineno">217</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span> <span class="o">=</span> <span class="n">hidden_size</span>
<span class="lineno">218</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyper_size</span> <span class="o">=</span> <span class="n">hyper_size</span></pre></div>
</div>
</div>
<div class='section' id='section-31'>
<div class='docs'>
<div class='section-link'>
<a href='#section-31'>#</a>
</div>
<p>Create cells for each layer. Note that only the first layer gets the input directly. Rest of the layers get the input from the layer below </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">222</span> <span class="bp">self</span><span class="o">.</span><span class="n">cells</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">ModuleList</span><span class="p">([</span><span class="n">HyperLSTMCell</span><span class="p">(</span><span class="n">input_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">,</span> <span class="n">n_z</span><span class="p">)]</span> <span class="o">+</span>
<span class="lineno">223</span> <span class="p">[</span><span class="n">HyperLSTMCell</span><span class="p">(</span><span class="n">hidden_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">,</span> <span class="n">n_z</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span>
<span class="lineno">224</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_layers</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)])</span></pre></div>
</div>
</div>
<div class='section' id='section-32'>
<div class='docs doc-strings'>
<div class='section-link'>
<a href='#section-32'>#</a>
</div>
<ul><li><code class="highlight"><span></span><span class="n">x</span></code>
has shape <code class="highlight"><span></span><span class="p">[</span><span class="n">n_steps</span><span class="p">,</span> <span class="n">batch_size</span><span class="p">,</span> <span class="n">input_size</span><span class="p">]</span></code>
and </li>
<li><code class="highlight"><span></span><span class="n">state</span></code>
is a tuple of <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1523199999999998em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqbl" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="mpunct" style="">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="">c</span></span><span class="mpunct">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord coloredeq eqy" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="mpunct" style="">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="">c</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span></span></span></span></span></span></span></span>. <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.8888799999999999em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqbl" style=""><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span><span class="mpunct" style="">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord mathnormal" style="">c</span></span></span></span></span> have shape <code class="highlight"><span></span><span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">hidden_size</span><span class="p">]</span></code>
and <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:1.1523199999999998em;vertical-align:-0.19444em;"></span><span class="mord coloredeq eqy" style=""><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.9578799999999998em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span><span style="top:-3.26344em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.25em;"><span class="mord" style="">^</span></span></span></span></span></span></span><span class="mpunct" style="">,</span><span class="mspace" style="margin-right:0.16666666666666666em;"></span><span class="mord accent" style=""><span class="vlist-t"><span class="vlist-r"><span class="vlist" style="height:0.69444em;"><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="mord mathnormal" style="">c</span></span><span style="top:-3em;"><span class="pstrut" style="height:3em;"></span><span class="accent-body" style="left:-0.19444em;"><span class="mord" style="">^</span></span></span></span></span></span></span></span></span></span></span> have shape <code class="highlight"><span></span><span class="p">[</span><span class="n">batch_size</span><span class="p">,</span> <span class="n">hyper_size</span><span class="p">]</span></code>
.</li></ul>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">226</span> <span class="k">def</span> <span class="nf">forward</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span>
<span class="lineno">227</span> <span class="n">state</span><span class="p">:</span> <span class="n">Optional</span><span class="p">[</span><span class="n">Tuple</span><span class="p">[</span><span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">,</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">]]</span> <span class="o">=</span> <span class="kc">None</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-33'>
<div class='docs'>
<div class='section-link'>
<a href='#section-33'>#</a>
</div>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">234</span> <span class="n">n_steps</span><span class="p">,</span> <span class="n">batch_size</span> <span class="o">=</span> <span class="n">x</span><span class="o">.</span><span class="n">shape</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span></pre></div>
</div>
</div>
<div class='section' id='section-34'>
<div class='docs'>
<div class='section-link'>
<a href='#section-34'>#</a>
</div>
<p>Initialize the state with zeros if <code class="highlight"><span></span><span class="kc">None</span></code>
</p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">237</span> <span class="k">if</span> <span class="n">state</span> <span class="ow">is</span> <span class="kc">None</span><span class="p">:</span>
<span class="lineno">238</span> <span class="n">h</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)]</span>
<span class="lineno">239</span> <span class="n">c</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hidden_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)]</span>
<span class="lineno">240</span> <span class="n">h_hat</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyper_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)]</span>
<span class="lineno">241</span> <span class="n">c_hat</span> <span class="o">=</span> <span class="p">[</span><span class="n">x</span><span class="o">.</span><span class="n">new_zeros</span><span class="p">(</span><span class="n">batch_size</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">hyper_size</span><span class="p">)</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span><span class="p">)]</span></pre></div>
</div>
</div>
<div class='section' id='section-35'>
<div class='docs'>
<div class='section-link'>
<a href='#section-35'>#</a>
</div>
<p> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">243</span> <span class="k">else</span><span class="p">:</span>
<span class="lineno">244</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span><span class="p">)</span> <span class="o">=</span> <span class="n">state</span></pre></div>
</div>
</div>
<div class='section' id='section-36'>
<div class='docs'>
<div class='section-link'>
<a href='#section-36'>#</a>
</div>
<p>Reverse stack the tensors to get the states of each layer</p>
<p>📝 You can just work with the tensor itself but this is easier to debug </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">248</span> <span class="n">h</span><span class="p">,</span> <span class="n">c</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">h</span><span class="p">)),</span> <span class="nb">list</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">c</span><span class="p">))</span>
<span class="lineno">249</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span> <span class="o">=</span> <span class="nb">list</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">h_hat</span><span class="p">)),</span> <span class="nb">list</span><span class="p">(</span><span class="n">torch</span><span class="o">.</span><span class="n">unbind</span><span class="p">(</span><span class="n">c_hat</span><span class="p">))</span></pre></div>
</div>
</div>
<div class='section' id='section-37'>
<div class='docs'>
<div class='section-link'>
<a href='#section-37'>#</a>
</div>
<p>Collect the outputs of the final layer at each step </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">252</span> <span class="n">out</span> <span class="o">=</span> <span class="p">[]</span>
<span class="lineno">253</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n_steps</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-38'>
<div class='docs'>
<div class='section-link'>
<a href='#section-38'>#</a>
</div>
<p>Input to the first layer is the input itself </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">255</span> <span class="n">inp</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="n">t</span><span class="p">]</span></pre></div>
</div>
</div>
<div class='section' id='section-39'>
<div class='docs'>
<div class='section-link'>
<a href='#section-39'>#</a>
</div>
<p>Loop through the layers </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">257</span> <span class="k">for</span> <span class="n">layer</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_layers</span><span class="p">):</span></pre></div>
</div>
</div>
<div class='section' id='section-40'>
<div class='docs'>
<div class='section-link'>
<a href='#section-40'>#</a>
</div>
<p>Get the state of the layer </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">259</span> <span class="n">h</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">c</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">h_hat</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">c_hat</span><span class="p">[</span><span class="n">layer</span><span class="p">]</span> <span class="o">=</span> \
<span class="lineno">260</span> <span class="bp">self</span><span class="o">.</span><span class="n">cells</span><span class="p">[</span><span class="n">layer</span><span class="p">](</span><span class="n">inp</span><span class="p">,</span> <span class="n">h</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">c</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">h_hat</span><span class="p">[</span><span class="n">layer</span><span class="p">],</span> <span class="n">c_hat</span><span class="p">[</span><span class="n">layer</span><span class="p">])</span></pre></div>
</div>
</div>
<div class='section' id='section-41'>
<div class='docs'>
<div class='section-link'>
<a href='#section-41'>#</a>
</div>
<p>Input to the next layer is the state of this layer </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">262</span> <span class="n">inp</span> <span class="o">=</span> <span class="n">h</span><span class="p">[</span><span class="n">layer</span><span class="p">]</span></pre></div>
</div>
</div>
<div class='section' id='section-42'>
<div class='docs'>
<div class='section-link'>
<a href='#section-42'>#</a>
</div>
<p>Collect the output <span class="katex"><span aria-hidden="true" class="katex-html"><span class="base"><span class="strut" style="height:0.69444em;vertical-align:0em;"></span><span class="mord coloredeq eqbr" style=""><span class="mord mathnormal" style="">h</span></span></span></span></span> of the final layer </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">264</span> <span class="n">out</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">h</span><span class="p">[</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span></pre></div>
</div>
</div>
<div class='section' id='section-43'>
<div class='docs'>
<div class='section-link'>
<a href='#section-43'>#</a>
</div>
<p>Stack the outputs and states </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">267</span> <span class="n">out</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
<span class="lineno">268</span> <span class="n">h</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">h</span><span class="p">)</span>
<span class="lineno">269</span> <span class="n">c</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="lineno">270</span> <span class="n">h_hat</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">h_hat</span><span class="p">)</span>
<span class="lineno">271</span> <span class="n">c_hat</span> <span class="o">=</span> <span class="n">torch</span><span class="o">.</span><span class="n">stack</span><span class="p">(</span><span class="n">c_hat</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='section' id='section-44'>
<div class='docs'>
<div class='section-link'>
<a href='#section-44'>#</a>
</div>
<p> </p>
</div>
<div class='code'>
<div class="highlight"><pre><span class="lineno">274</span> <span class="k">return</span> <span class="n">out</span><span class="p">,</span> <span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">h_hat</span><span class="p">,</span> <span class="n">c_hat</span><span class="p">)</span></pre></div>
</div>
</div>
<div class='footer'>
<a href="https://papers.labml.ai">Trending Research Papers</a>
<a href="https://labml.ai">labml.ai</a>
</div>
</div>
<script src=../interactive.js?v=1"></script>
<script>
function handleImages() {
var images = document.querySelectorAll('p>img')
for (var i = 0; i < images.length; ++i) {
handleImage(images[i])
}
}
function handleImage(img) {
img.parentElement.style.textAlign = 'center'
var modal = document.createElement('div')
modal.id = 'modal'
var modalContent = document.createElement('div')
modal.appendChild(modalContent)
var modalImage = document.createElement('img')
modalContent.appendChild(modalImage)
var span = document.createElement('span')
span.classList.add('close')
span.textContent = 'x'
modal.appendChild(span)
img.onclick = function () {
console.log('clicked')
document.body.appendChild(modal)
modalImage.src = img.src
}
span.onclick = function () {
document.body.removeChild(modal)
}
}
handleImages()
</script>
</body>
</html>