mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 16:50:39 +08:00
experiment notebook
This commit is contained in:
@ -75,8 +75,8 @@
|
||||
<h1>Fast weights transformer</h1>
|
||||
<p>The paper
|
||||
<a href="https://arxiv.org/abs/2102.11174">Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch</a>
|
||||
finds similarities between linear self attention and fast weight systems,
|
||||
and makes modifications to self attention update rule based on that.
|
||||
finds similarities between linear self-attention and fast weight systems
|
||||
and makes modifications to self-attention update rule based on that.
|
||||
It also introduces a simpler, yet effective kernel function.</p>
|
||||
<h2>Fast weights</h2>
|
||||
<p>Consider a sequence of inputs $\big\{x^{(i)}\big\}^L_{i=1}$ or length $L$
|
||||
@ -140,8 +140,8 @@ y^{(i)} &= \frac{1}{z^{(i)} \cdot \color{lightgreen}{\phi(q^{(i)})}}
|
||||
<p>The paper introduces a new linear attention projection function $\color{lightgreen}{\phi}$
|
||||
a new update rule for $\color{cyan}{W^{(i)}} = f(\color{cyan}{W^{(i-1)}})$ and change the normalization
|
||||
$\frac{1}{z^{(i)} \cdot \color{lightgreen}{\phi(q^{(i)})}}$</p>
|
||||
<p>Here’s <a href="experiment.html">the training code</a> and a notebook for training a fast weights
|
||||
transformer on Tiny Shakespeare dataset.</p>
|
||||
<p>Here are <a href="experiment.html">the training code</a> and a notebook for training a fast weights
|
||||
transformer on the Tiny Shakespeare dataset.</p>
|
||||
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a>
|
||||
<a href="https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002"><img alt="View Run" src="https://img.shields.io/badge/labml-experiment-brightgreen" /></a></p>
|
||||
</div>
|
||||
@ -669,12 +669,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-40'>#</a>
|
||||
</div>
|
||||
|
||||
<p>Fast weights attention module</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">281</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span> <span class="o">=</span> <span class="n">attn</span>
|
||||
<span class="lineno">282</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span> <span class="o">=</span> <span class="n">feed_forward</span>
|
||||
<span class="lineno">283</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_prob</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">281</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span> <span class="o">=</span> <span class="n">attn</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-41'>
|
||||
@ -682,11 +680,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-41'>#</a>
|
||||
</div>
|
||||
<p>Normalization layers</p>
|
||||
<p>Feed-forward network</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">286</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span>
|
||||
<span class="lineno">287</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">283</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span> <span class="o">=</span> <span class="n">feed_forward</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-42'>
|
||||
@ -694,11 +691,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-42'>#</a>
|
||||
</div>
|
||||
|
||||
<p>Dropout layer</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">289</span> <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span>
|
||||
<span class="lineno">290</span> <span class="n">attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">285</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">Dropout</span><span class="p">(</span><span class="n">dropout_prob</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-43'>
|
||||
@ -706,10 +702,11 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-43'>#</a>
|
||||
</div>
|
||||
<p>Add the self attention results</p>
|
||||
<p>Normalization layers</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">292</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">attn</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">288</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_self_attn</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span>
|
||||
<span class="lineno">289</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">d_model</span><span class="p">])</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-44'>
|
||||
@ -717,10 +714,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-44'>#</a>
|
||||
</div>
|
||||
<p>Normalize for feed-forward</p>
|
||||
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">295</span> <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">291</span> <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-45'>
|
||||
@ -728,10 +725,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-45'>#</a>
|
||||
</div>
|
||||
<p>Pass through the feed-forward network</p>
|
||||
<p>Calculate fast weights self attention</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">297</span> <span class="n">ff</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span><span class="p">(</span><span class="n">z</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">293</span> <span class="n">attn</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">attn</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-46'>
|
||||
@ -739,10 +736,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-46'>#</a>
|
||||
</div>
|
||||
<p>Add the feed-forward results back</p>
|
||||
<p>Add the self attention results</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">299</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">ff</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">295</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">attn</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-47'>
|
||||
@ -750,21 +747,21 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-47'>#</a>
|
||||
</div>
|
||||
|
||||
<p>Normalize for feed-forward</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">302</span> <span class="k">return</span> <span class="n">x</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">298</span> <span class="n">z</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm_ff</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-48'>
|
||||
<div class='docs doc-strings'>
|
||||
<div class='docs'>
|
||||
<div class='section-link'>
|
||||
<a href='#section-48'>#</a>
|
||||
</div>
|
||||
<p>This is a general transformer module with multiple transformer layers</p>
|
||||
<p>Pass through the feed-forward network</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">305</span><span class="k">class</span> <span class="nc">FastWeightsAttentionTransformer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">300</span> <span class="n">ff</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">feed_forward</span><span class="p">(</span><span class="n">z</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-49'>
|
||||
@ -772,11 +769,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-49'>#</a>
|
||||
</div>
|
||||
|
||||
<p>Add the feed-forward results back</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">309</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer</span><span class="p">:</span> <span class="n">FastWeightsAttentionTransformerLayer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="lineno">310</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">302</span> <span class="n">x</span> <span class="o">=</span> <span class="n">x</span> <span class="o">+</span> <span class="bp">self</span><span class="o">.</span><span class="n">dropout</span><span class="p">(</span><span class="n">ff</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-50'>
|
||||
@ -784,21 +780,21 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-50'>#</a>
|
||||
</div>
|
||||
<p>Make copies of the transformer layer</p>
|
||||
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">312</span> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">305</span> <span class="k">return</span> <span class="n">x</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-51'>
|
||||
<div class='docs'>
|
||||
<div class='docs doc-strings'>
|
||||
<div class='section-link'>
|
||||
<a href='#section-51'>#</a>
|
||||
</div>
|
||||
<p>Final normalization layer</p>
|
||||
<p>This is a general transformer module with multiple transformer layers</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">314</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">layer</span><span class="o">.</span><span class="n">size</span><span class="p">])</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">308</span><span class="k">class</span> <span class="nc">FastWeightsAttentionTransformer</span><span class="p">(</span><span class="n">Module</span><span class="p">):</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-52'>
|
||||
@ -809,8 +805,8 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">316</span> <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span>
|
||||
<span class="lineno">317</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">layer</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">layers</span><span class="p">):</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">312</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">layer</span><span class="p">:</span> <span class="n">FastWeightsAttentionTransformerLayer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">:</span> <span class="nb">int</span><span class="p">):</span>
|
||||
<span class="lineno">313</span> <span class="nb">super</span><span class="p">()</span><span class="o">.</span><span class="fm">__init__</span><span class="p">()</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-53'>
|
||||
@ -818,10 +814,10 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-53'>#</a>
|
||||
</div>
|
||||
<p>Get layer output</p>
|
||||
<p>Make copies of the transformer layer</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">319</span> <span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">315</span> <span class="bp">self</span><span class="o">.</span><span class="n">layers</span> <span class="o">=</span> <span class="n">clone_module_list</span><span class="p">(</span><span class="n">layer</span><span class="p">,</span> <span class="n">n_layers</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-54'>
|
||||
@ -829,10 +825,44 @@ y^{(i)} &= \color{cyan}{W^{(i)}} \color{lightgreen}{\phi'(q^{(i)})}
|
||||
<div class='section-link'>
|
||||
<a href='#section-54'>#</a>
|
||||
</div>
|
||||
<p>Final normalization layer</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">317</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span> <span class="o">=</span> <span class="n">nn</span><span class="o">.</span><span class="n">LayerNorm</span><span class="p">([</span><span class="n">layer</span><span class="o">.</span><span class="n">size</span><span class="p">])</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-55'>
|
||||
<div class='docs'>
|
||||
<div class='section-link'>
|
||||
<a href='#section-55'>#</a>
|
||||
</div>
|
||||
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">319</span> <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">x</span><span class="p">:</span> <span class="n">torch</span><span class="o">.</span><span class="n">Tensor</span><span class="p">):</span>
|
||||
<span class="lineno">320</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">layer</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">layers</span><span class="p">):</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-56'>
|
||||
<div class='docs'>
|
||||
<div class='section-link'>
|
||||
<a href='#section-56'>#</a>
|
||||
</div>
|
||||
<p>Get layer output</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">322</span> <span class="n">x</span> <span class="o">=</span> <span class="n">layer</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
<div class='section' id='section-57'>
|
||||
<div class='docs'>
|
||||
<div class='section-link'>
|
||||
<a href='#section-57'>#</a>
|
||||
</div>
|
||||
<p>Normalize the output</p>
|
||||
</div>
|
||||
<div class='code'>
|
||||
<div class="highlight"><pre><span class="lineno">322</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
<div class="highlight"><pre><span class="lineno">325</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">norm</span><span class="p">(</span><span class="n">x</span><span class="p">)</span></pre></div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
@ -10,8 +10,8 @@ summary: >
|
||||
|
||||
The paper
|
||||
[Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch](https://arxiv.org/abs/2102.11174)
|
||||
finds similarities between linear self attention and fast weight systems,
|
||||
and makes modifications to self attention update rule based on that.
|
||||
finds similarities between linear self-attention and fast weight systems
|
||||
and makes modifications to self-attention update rule based on that.
|
||||
It also introduces a simpler, yet effective kernel function.
|
||||
|
||||
## Fast weights
|
||||
@ -82,8 +82,8 @@ The paper introduces a new linear attention projection function $\color{lightgre
|
||||
a new update rule for $\color{cyan}{W^{(i)}} = f(\color{cyan}{W^{(i-1)}})$ and change the normalization
|
||||
$\frac{1}{z^{(i)} \cdot \color{lightgreen}{\phi(q^{(i)})}}$
|
||||
|
||||
Here's [the training code](experiment.html) and a notebook for training a fast weights
|
||||
transformer on Tiny Shakespeare dataset.
|
||||
Here are [the training code](experiment.html) and a notebook for training a fast weights
|
||||
transformer on the Tiny Shakespeare dataset.
|
||||
|
||||
[](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb)
|
||||
[](https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002)
|
||||
@ -277,9 +277,11 @@ class FastWeightsAttentionTransformerLayer(Module):
|
||||
super().__init__()
|
||||
# Transformer size $d_{model}$
|
||||
self.size = d_model
|
||||
#
|
||||
# Fast weights attention module
|
||||
self.attn = attn
|
||||
# Feed-forward network
|
||||
self.feed_forward = feed_forward
|
||||
# Dropout layer
|
||||
self.dropout = nn.Dropout(dropout_prob)
|
||||
|
||||
# Normalization layers
|
||||
@ -287,6 +289,7 @@ class FastWeightsAttentionTransformerLayer(Module):
|
||||
self.norm_ff = nn.LayerNorm([d_model])
|
||||
|
||||
def __call__(self, x: torch.Tensor):
|
||||
# Calculate fast weights self attention
|
||||
attn = self.attn(x)
|
||||
# Add the self attention results
|
||||
x = x + self.dropout(attn)
|
||||
|
425
labml_nn/transformers/fast_weights/experiment.ipynb
Normal file
425
labml_nn/transformers/fast_weights/experiment.ipynb
Normal file
@ -0,0 +1,425 @@
|
||||
{
|
||||
"nbformat": 4,
|
||||
"nbformat_minor": 0,
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"name": "Fast Weights Transformer",
|
||||
"provenance": [],
|
||||
"collapsed_sections": []
|
||||
},
|
||||
"kernelspec": {
|
||||
"name": "python3",
|
||||
"display_name": "Python 3"
|
||||
},
|
||||
"accelerator": "GPU"
|
||||
},
|
||||
"cells": [
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "AYV_dMVDxyc2"
|
||||
},
|
||||
"source": [
|
||||
"[](https://github.com/lab-ml/nn)\n",
|
||||
"[](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/transformers/fast_weights/experiment.ipynb) \n",
|
||||
"\n",
|
||||
"## Fast Weights Transformer\n",
|
||||
"\n",
|
||||
"This is an experiment training Shakespeare dataset with a Compressive Transformer model."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "AahG_i2y5tY9"
|
||||
},
|
||||
"source": [
|
||||
"Install the `labml-nn` package"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "ZCzmCrAIVg0L",
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/"
|
||||
},
|
||||
"outputId": "b73a038d-62dc-48e0-c2f1-9454224e9137"
|
||||
},
|
||||
"source": [
|
||||
"!pip install labml-nn"
|
||||
],
|
||||
"execution_count": 1,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "stream",
|
||||
"text": [
|
||||
"Collecting labml-nn\n",
|
||||
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/e8/e9/9fc12f6e1f407399f518ddf934410f0186578b6e2dc5dbc17b47910a88ad/labml_nn-0.4.91-py3-none-any.whl (171kB)\n",
|
||||
"\r\u001b[K |██ | 10kB 20.7MB/s eta 0:00:01\r\u001b[K |███▉ | 20kB 19.4MB/s eta 0:00:01\r\u001b[K |█████▊ | 30kB 15.5MB/s eta 0:00:01\r\u001b[K |███████▋ | 40kB 14.4MB/s eta 0:00:01\r\u001b[K |█████████▋ | 51kB 9.5MB/s eta 0:00:01\r\u001b[K |███████████▌ | 61kB 9.7MB/s eta 0:00:01\r\u001b[K |█████████████▍ | 71kB 10.3MB/s eta 0:00:01\r\u001b[K |███████████████▎ | 81kB 11.3MB/s eta 0:00:01\r\u001b[K |█████████████████▎ | 92kB 10.6MB/s eta 0:00:01\r\u001b[K |███████████████████▏ | 102kB 9.2MB/s eta 0:00:01\r\u001b[K |█████████████████████ | 112kB 9.2MB/s eta 0:00:01\r\u001b[K |███████████████████████ | 122kB 9.2MB/s eta 0:00:01\r\u001b[K |█████████████████████████ | 133kB 9.2MB/s eta 0:00:01\r\u001b[K |██████████████████████████▉ | 143kB 9.2MB/s eta 0:00:01\r\u001b[K |████████████████████████████▊ | 153kB 9.2MB/s eta 0:00:01\r\u001b[K |██████████████████████████████▋ | 163kB 9.2MB/s eta 0:00:01\r\u001b[K |████████████████████████████████| 174kB 9.2MB/s \n",
|
||||
"\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from labml-nn) (1.19.5)\n",
|
||||
"Collecting labml-helpers>=0.4.76\n",
|
||||
" Downloading https://files.pythonhosted.org/packages/49/df/4d920a4a221acd3cfa384dddb909ed0691b08682c0d8aeaabeee2138624f/labml_helpers-0.4.76-py3-none-any.whl\n",
|
||||
"Requirement already satisfied: torch in /usr/local/lib/python3.7/dist-packages (from labml-nn) (1.8.0+cu101)\n",
|
||||
"Collecting labml>=0.4.103\n",
|
||||
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/c7/c6/9d7e6664116d972b33d1c41ab8f5609e59a218f9ee7d0e3a7f433ddce07e/labml-0.4.106-py3-none-any.whl (103kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 112kB 17.1MB/s \n",
|
||||
"\u001b[?25hCollecting einops\n",
|
||||
" Downloading https://files.pythonhosted.org/packages/5d/a0/9935e030634bf60ecd572c775f64ace82ceddf2f504a5fd3902438f07090/einops-0.3.0-py2.py3-none-any.whl\n",
|
||||
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from torch->labml-nn) (3.7.4.3)\n",
|
||||
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.7/dist-packages (from labml>=0.4.103->labml-nn) (3.13)\n",
|
||||
"Collecting gitpython\n",
|
||||
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/a6/99/98019716955ba243657daedd1de8f3a88ca1f5b75057c38e959db22fb87b/GitPython-3.1.14-py3-none-any.whl (159kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 163kB 15.3MB/s \n",
|
||||
"\u001b[?25hCollecting gitdb<5,>=4.0.1\n",
|
||||
"\u001b[?25l Downloading https://files.pythonhosted.org/packages/48/11/d1800bca0a3bae820b84b7d813ad1eff15a48a64caea9c823fc8c1b119e8/gitdb-4.0.5-py3-none-any.whl (63kB)\n",
|
||||
"\u001b[K |████████████████████████████████| 71kB 10.6MB/s \n",
|
||||
"\u001b[?25hCollecting smmap<4,>=3.0.1\n",
|
||||
" Downloading https://files.pythonhosted.org/packages/d5/1e/6130925131f639b2acde0f7f18b73e33ce082ff2d90783c436b52040af5a/smmap-3.0.5-py2.py3-none-any.whl\n",
|
||||
"Installing collected packages: smmap, gitdb, gitpython, labml, labml-helpers, einops, labml-nn\n",
|
||||
"Successfully installed einops-0.3.0 gitdb-4.0.5 gitpython-3.1.14 labml-0.4.106 labml-helpers-0.4.76 labml-nn-0.4.91 smmap-3.0.5\n"
|
||||
],
|
||||
"name": "stdout"
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "SE2VUQ6L5zxI"
|
||||
},
|
||||
"source": [
|
||||
"Imports"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "0hJXx_g0wS2C"
|
||||
},
|
||||
"source": [
|
||||
"import torch\n",
|
||||
"import torch.nn as nn\n",
|
||||
"\n",
|
||||
"from labml import experiment\n",
|
||||
"from labml.configs import option\n",
|
||||
"from labml_nn.transformers.fast_weights.experiment import Configs"
|
||||
],
|
||||
"execution_count": 2,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "Lpggo0wM6qb-"
|
||||
},
|
||||
"source": [
|
||||
"Create an experiment"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "bFcr9k-l4cAg"
|
||||
},
|
||||
"source": [
|
||||
"experiment.create(name=\"fast_weights_transformer\")"
|
||||
],
|
||||
"execution_count": 3,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "-OnHLi626tJt"
|
||||
},
|
||||
"source": [
|
||||
"Initialize configurations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "Piz0c5f44hRo"
|
||||
},
|
||||
"source": [
|
||||
"conf = Configs()"
|
||||
],
|
||||
"execution_count": 4,
|
||||
"outputs": []
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "wwMzCqpD6vkL"
|
||||
},
|
||||
"source": [
|
||||
"Set experiment configurations and assign a configurations dictionary to override configurations"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 17
|
||||
},
|
||||
"id": "e6hmQhTw4nks",
|
||||
"outputId": "6660f5c6-347c-4370-f19a-ff7bd9b96535"
|
||||
},
|
||||
"source": [
|
||||
"experiment.configs(conf,\n",
|
||||
" # A dictionary of configurations to override\n",
|
||||
" {'tokenizer': 'character',\n",
|
||||
" 'text': 'tiny_shakespeare',\n",
|
||||
" 'optimizer.learning_rate': 1.0,\n",
|
||||
" 'optimizer.optimizer': 'Noam',\n",
|
||||
" 'prompt': 'It is',\n",
|
||||
" 'prompt_separator': '',\n",
|
||||
"\n",
|
||||
" 'train_loader': 'shuffled_train_loader',\n",
|
||||
" 'valid_loader': 'shuffled_valid_loader',\n",
|
||||
"\n",
|
||||
" 'seq_len': 128,\n",
|
||||
" 'epochs': 128,\n",
|
||||
" 'batch_size': 16,\n",
|
||||
" 'inner_iterations': 25})"
|
||||
],
|
||||
"execution_count": 5,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "display_data",
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"overflow-x: scroll;\"></pre>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.HTML object>"
|
||||
]
|
||||
},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "EvI7MtgJ61w5"
|
||||
},
|
||||
"source": [
|
||||
"Set PyTorch models for loading and saving"
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 255
|
||||
},
|
||||
"id": "GDlt7dp-5ALt",
|
||||
"outputId": "5154be3b-dbfc-4002-82f6-1b6d58970e21"
|
||||
},
|
||||
"source": [
|
||||
"experiment.add_pytorch_models({'model': conf.model})"
|
||||
],
|
||||
"execution_count": 6,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "display_data",
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"overflow-x: scroll;\">Prepare model...\n",
|
||||
" Prepare n_tokens...\n",
|
||||
" Prepare text...\n",
|
||||
" Prepare tokenizer<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t5.76ms</span>\n",
|
||||
" Download<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t483.76ms</span>\n",
|
||||
" Load data<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t3.16ms</span>\n",
|
||||
" Tokenize<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t62.37ms</span>\n",
|
||||
" Build vocabulary<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t167.37ms</span>\n",
|
||||
" Prepare text<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t739.15ms</span>\n",
|
||||
" Prepare n_tokens<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t747.44ms</span>\n",
|
||||
" Prepare device...\n",
|
||||
" Prepare device_info<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t66.05ms</span>\n",
|
||||
" Prepare device<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t68.84ms</span>\n",
|
||||
"Prepare model<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t11,409.04ms</span>\n",
|
||||
"</pre>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.HTML object>"
|
||||
]
|
||||
},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "markdown",
|
||||
"metadata": {
|
||||
"id": "KJZRf8527GxL"
|
||||
},
|
||||
"source": [
|
||||
"Start the experiment and run the training loop."
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"colab": {
|
||||
"base_uri": "https://localhost:8080/",
|
||||
"height": 1000
|
||||
},
|
||||
"id": "aIAWo7Fw5DR8",
|
||||
"outputId": "583dc28a-338e-4089-a415-e21d8b65fa3e"
|
||||
},
|
||||
"source": [
|
||||
"with experiment.start():\n",
|
||||
" conf.run()"
|
||||
],
|
||||
"execution_count": 7,
|
||||
"outputs": [
|
||||
{
|
||||
"output_type": "display_data",
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"overflow-x: scroll;\">\n",
|
||||
"<strong><span style=\"text-decoration: underline\">fast_weights_transformer</span></strong>: <span style=\"color: #208FFB\">928aadc0846c11eb85710242ac1c0002</span>\n",
|
||||
"\t[dirty]: <strong><span style=\"color: #DDB62B\">\"\"</span></strong>\n",
|
||||
"Initialize...\n",
|
||||
" Prepare mode<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t4.62ms</span>\n",
|
||||
"Initialize<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t66.55ms</span>\n",
|
||||
"Prepare validator...\n",
|
||||
" Prepare valid_loader<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t78.48ms</span>\n",
|
||||
"Prepare validator<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t166.22ms</span>\n",
|
||||
"Prepare trainer...\n",
|
||||
" Prepare train_loader<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t158.29ms</span>\n",
|
||||
"Prepare trainer<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t191.71ms</span>\n",
|
||||
"Prepare training_loop...\n",
|
||||
" Prepare loop_count<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t35.61ms</span>\n",
|
||||
"<span style=\"color: #C5C1B4\"></span>\n",
|
||||
"<span style=\"color: #C5C1B4\">--------------------------------------------------</span><span style=\"color: #DDB62B\"><strong><span style=\"text-decoration: underline\"></span></strong></span>\n",
|
||||
"<span style=\"color: #DDB62B\"><strong><span style=\"text-decoration: underline\">LABML WARNING</span></strong></span>\n",
|
||||
"<span style=\"color: #DDB62B\"><strong><span style=\"text-decoration: underline\"></span></strong></span>LabML App Warning: <span style=\"color: #60C6C8\">empty_token: </span><strong>Please create a valid token at https://app.labml.ai.</strong>\n",
|
||||
"<strong>Click on the experiment link to monitor the experiment and add it to your experiments list.</strong><span style=\"color: #C5C1B4\"></span>\n",
|
||||
"<span style=\"color: #C5C1B4\">--------------------------------------------------</span>\n",
|
||||
"<span style=\"color: #208FFB\">Monitor experiment at </span><a href='https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002' target='blank'>https://app.labml.ai/run/928aadc0846c11eb85710242ac1c0002</a>\n",
|
||||
"Prepare training_loop<span style=\"color: #00A250\">...[DONE]</span><span style=\"color: #208FFB\">\t243.56ms</span>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong>h</strong><strong>j</strong><strong>S</strong><strong>N</strong><strong>X</strong><strong>p</strong><strong>h</strong><strong>j</strong><strong>S</strong><strong>N</strong><strong>X</strong><strong>p</strong><strong>h</strong><strong>j</strong><strong>S</strong><strong>N</strong><strong>X</strong><strong>p</strong><strong>h</strong><strong>j</strong><strong>S</strong><strong>N</strong><strong>X</strong><strong>p</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong><strong>t</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>o</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>o</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>o</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong>\n",
|
||||
"<strong><span style=\"color: #DDB62B\">1,003,776: </span></strong>Sample:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 1,117ms </span>Train:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 540,586ms </span>Valid:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 9,472ms </span> loss.train: <span style=\"color: #C5C1B4\"> 2.17545</span> accuracy.train: <span style=\"color: #C5C1B4\">0.322394</span> loss.valid: <span style=\"color: #C5C1B4\"> 2.12992</span> accuracy.valid: <span style=\"color: #C5C1B4\">0.322833</span> <span style=\"color: #208FFB\">508,423ms</span><span style=\"color: #D160C4\"> 0:08m/ 17:56m </span>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>r</strong><strong>a</strong><strong>i</strong><strong>n</strong><strong>g</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>a</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>a</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong>m</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong>m</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>i</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong>r</strong><strong>e</strong><strong> </strong><strong>t</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>a</strong><strong>y</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>a</strong><strong>y</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>a</strong><strong>y</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>h</strong><strong>e</strong><strong>a</strong><strong>r</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>h</strong><strong>e</strong><strong>a</strong><strong>r</strong><strong></strong>\n",
|
||||
"<strong></strong><strong>T</strong><strong>o</strong><strong> </strong><strong>t</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>i</strong><strong>t</strong><strong>h</strong><strong> </strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>b</strong><strong>e</strong><strong>a</strong><strong>r</strong><strong> </strong><strong>a</strong><strong>n</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>i</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>a</strong><strong>y</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>i</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>a</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>i</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>w</strong><strong>a</strong><strong>s</strong><strong> </strong><strong>t</strong><strong>h</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>m</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>m</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>a</strong><strong>y</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>t</strong><strong>a</strong><strong>n</strong><strong>d</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong>u</strong><strong>l</strong><strong>s</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>o</strong><strong>u</strong><strong>l</strong><strong>s</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong>\n",
|
||||
"<strong><span style=\"color: #DDB62B\">2,007,552: </span></strong>Sample:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 1,065ms </span>Train:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 488,026ms </span>Valid:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 9,364ms </span> loss.train: <span style=\"color: #C5C1B4\"> 1.66291</span> accuracy.train: <span style=\"color: #C5C1B4\">0.480229</span> loss.valid: <span style=\"color: #C5C1B4\"> 1.78311</span> accuracy.valid: <span style=\"color: #C5C1B4\">0.446461</span> <span style=\"color: #208FFB\">512,805ms</span><span style=\"color: #D160C4\"> 0:17m/ 17:56m </span>\n",
|
||||
"<span style=\"color: #C5C1B4\">It is</span><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong><strong>e</strong><strong> </strong><strong>t</strong><strong>h</strong><strong>e</strong><strong> </strong><strong>s</strong><strong>h</strong><strong>a</strong><strong>l</strong><strong>l</strong><strong> </strong><strong>b</strong>\n",
|
||||
"<strong><span style=\"color: #DDB62B\">2,034,176: </span></strong>Sample:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 1,046ms </span>Train:<span style=\"color: #C5C1B4\"> 2%</span><span style=\"color: #208FFB\"> 488,026ms </span>Valid:<span style=\"color: #C5C1B4\"> 100%</span><span style=\"color: #208FFB\"> 9,364ms </span> loss.train: <strong> 1.57689</strong> accuracy.train: <strong>0.514986</strong> loss.valid: <span style=\"color: #C5C1B4\"> 1.78311</span> accuracy.valid: <span style=\"color: #C5C1B4\">0.446461</span> <span style=\"color: #208FFB\">512,805ms</span><span style=\"color: #D160C4\"> 0:17m/ 17:56m </span>\n",
|
||||
"Saving model...\n",
|
||||
"<span style=\"color: #E75C58\">Killing loop...</span>\n",
|
||||
"<strong><span style=\"color: #DDB62B\">Still updating app.labml.ai, please wait for it to complete...</span></strong></pre>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.HTML object>"
|
||||
]
|
||||
},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
}
|
||||
},
|
||||
{
|
||||
"output_type": "error",
|
||||
"ename": "KeyboardInterrupt",
|
||||
"evalue": "ignored",
|
||||
"traceback": [
|
||||
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m Traceback (most recent call last)",
|
||||
"\u001b[0;32m<ipython-input-7-64a311337f9f>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mexperiment\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstart\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mconf\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_helpers/train_valid.py\u001b[0m in \u001b[0;36mrun\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 241\u001b[0m \u001b[0m_\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrainer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 242\u001b[0m \u001b[0;32mfor\u001b[0m \u001b[0m_\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtraining_loop\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 243\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrun_step\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 244\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 245\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0msample\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_helpers/train_valid.py\u001b[0m in \u001b[0;36mrun_step\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 229\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmode\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mupdate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mis_train\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mTrue\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 230\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mtracker\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnamespace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'train'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 231\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtrainer\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 232\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mvalidator\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 233\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mtracker\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnamespace\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'valid'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_helpers/train_valid.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 133\u001b[0m \u001b[0msm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mon_epoch_start\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 134\u001b[0m \u001b[0;32mwith\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mset_grad_enabled\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmode\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_train\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 135\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__iterate\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 136\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 137\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_batch_index\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcompleted\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_helpers/train_valid.py\u001b[0m in \u001b[0;36m__iterate\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 146\u001b[0m \u001b[0mbatch\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnext\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__iterable\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 147\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 148\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstep\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mbatch\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_batch_index\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 149\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 150\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_batch_index\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstep\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_nn/experiments/nlp_autoregression.py\u001b[0m in \u001b[0;36mstep\u001b[0;34m(self, batch, batch_idx)\u001b[0m\n\u001b[1;32m 134\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmode\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mis_train\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 135\u001b[0m \u001b[0;31m# Calculate gradients\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 136\u001b[0;31m \u001b[0mloss\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 137\u001b[0m \u001b[0;31m# Clip gradients\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 138\u001b[0m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnn\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mutils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mclip_grad_norm_\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mmodel\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mparameters\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mmax_norm\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgrad_norm_clip\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/torch/tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(self, gradient, retain_graph, create_graph, inputs)\u001b[0m\n\u001b[1;32m 243\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 244\u001b[0m inputs=inputs)\n\u001b[0;32m--> 245\u001b[0;31m \u001b[0mtorch\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0minputs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 246\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 247\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mregister_hook\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mhook\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[0;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)\u001b[0m\n\u001b[1;32m 145\u001b[0m Variable._execution_engine.run_backward(\n\u001b[1;32m 146\u001b[0m \u001b[0mtensors\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mgrad_tensors_\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0minputs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 147\u001b[0;31m allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag\n\u001b[0m\u001b[1;32m 148\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 149\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;32m/usr/local/lib/python3.7/dist-packages/labml_helpers/training_loop.py\u001b[0m in \u001b[0;36m__handler\u001b[0;34m(self, sig, frame)\u001b[0m\n\u001b[1;32m 162\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m__finish\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 163\u001b[0m \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mlog\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'Killing loop...'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mText\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdanger\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 164\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mold_handler\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msig\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mframe\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 165\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 166\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0m__str__\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
|
||||
"\u001b[0;31mKeyboardInterrupt\u001b[0m: "
|
||||
]
|
||||
},
|
||||
{
|
||||
"output_type": "display_data",
|
||||
"data": {
|
||||
"text/html": [
|
||||
"<pre style=\"overflow-x: scroll;\"><span style=\"color: #208FFB\">Updating App. Please wait...</span></pre>"
|
||||
],
|
||||
"text/plain": [
|
||||
"<IPython.core.display.HTML object>"
|
||||
]
|
||||
},
|
||||
"metadata": {
|
||||
"tags": []
|
||||
}
|
||||
}
|
||||
]
|
||||
},
|
||||
{
|
||||
"cell_type": "code",
|
||||
"metadata": {
|
||||
"id": "oBXXlP2b7XZO"
|
||||
},
|
||||
"source": [
|
||||
""
|
||||
],
|
||||
"execution_count": null,
|
||||
"outputs": []
|
||||
}
|
||||
]
|
||||
}
|
Reference in New Issue
Block a user