From 2e545435e8c9f3c3c831a3d45e6a5a99a42b9f28 Mon Sep 17 00:00:00 2001
From: Varuna Jayasiri <vpjayasiri@gmail.com>
Date: Mon, 15 Feb 2021 21:44:58 +0530
Subject: [PATCH] internal covariate shift

---
 docs/normalization/batch_norm/index.html      | 2 +-
 labml_nn/normalization/batch_norm/__init__.py | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/docs/normalization/batch_norm/index.html b/docs/normalization/batch_norm/index.html
index ad1b211d..05ecbeda 100644
--- a/docs/normalization/batch_norm/index.html
+++ b/docs/normalization/batch_norm/index.html
@@ -82,7 +82,7 @@ network parameters during training.
 For example, let&rsquo;s say there are two layers $l_1$ and $l_2$.
 During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 could be in distribution $\mathcal{N}(0.5, 1)$.
-Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
+Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
 This is <em>internal covariate shift</em>.</p>
 <p>Internal covariate shift will adversely affect training speed because the later layers
 ($l_2$ in the above example) have to adapt to this shifted distribution.</p>
diff --git a/labml_nn/normalization/batch_norm/__init__.py b/labml_nn/normalization/batch_norm/__init__.py
index eef75e2c..bf60e27e 100644
--- a/labml_nn/normalization/batch_norm/__init__.py
+++ b/labml_nn/normalization/batch_norm/__init__.py
@@ -18,7 +18,7 @@ network parameters during training.
 For example, let's say there are two layers $l_1$ and $l_2$.
 During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 could be in distribution $\mathcal{N}(0.5, 1)$.
-Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
+Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
 This is *internal covariate shift*.
 
 Internal covariate shift will adversely affect training speed because the later layers