mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-11-02 21:40:15 +08:00
internal covariate shift
This commit is contained in:
@ -82,7 +82,7 @@ network parameters during training.
|
|||||||
For example, let’s say there are two layers $l_1$ and $l_2$.
|
For example, let’s say there are two layers $l_1$ and $l_2$.
|
||||||
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
|
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
|
||||||
could be in distribution $\mathcal{N}(0.5, 1)$.
|
could be in distribution $\mathcal{N}(0.5, 1)$.
|
||||||
Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
|
Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
|
||||||
This is <em>internal covariate shift</em>.</p>
|
This is <em>internal covariate shift</em>.</p>
|
||||||
<p>Internal covariate shift will adversely affect training speed because the later layers
|
<p>Internal covariate shift will adversely affect training speed because the later layers
|
||||||
($l_2$ in the above example) have to adapt to this shifted distribution.</p>
|
($l_2$ in the above example) have to adapt to this shifted distribution.</p>
|
||||||
|
|||||||
@ -18,7 +18,7 @@ network parameters during training.
|
|||||||
For example, let's say there are two layers $l_1$ and $l_2$.
|
For example, let's say there are two layers $l_1$ and $l_2$.
|
||||||
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
|
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
|
||||||
could be in distribution $\mathcal{N}(0.5, 1)$.
|
could be in distribution $\mathcal{N}(0.5, 1)$.
|
||||||
Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
|
Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
|
||||||
This is *internal covariate shift*.
|
This is *internal covariate shift*.
|
||||||
|
|
||||||
Internal covariate shift will adversely affect training speed because the later layers
|
Internal covariate shift will adversely affect training speed because the later layers
|
||||||
|
|||||||
Reference in New Issue
Block a user