mirror of
				https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
				synced 2025-11-04 06:16:05 +08:00 
			
		
		
		
	internal covariate shift
This commit is contained in:
		@ -82,7 +82,7 @@ network parameters during training.
 | 
				
			|||||||
For example, let’s say there are two layers $l_1$ and $l_2$.
 | 
					For example, let’s say there are two layers $l_1$ and $l_2$.
 | 
				
			||||||
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 | 
					During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 | 
				
			||||||
could be in distribution $\mathcal{N}(0.5, 1)$.
 | 
					could be in distribution $\mathcal{N}(0.5, 1)$.
 | 
				
			||||||
Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
 | 
					Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
 | 
				
			||||||
This is <em>internal covariate shift</em>.</p>
 | 
					This is <em>internal covariate shift</em>.</p>
 | 
				
			||||||
<p>Internal covariate shift will adversely affect training speed because the later layers
 | 
					<p>Internal covariate shift will adversely affect training speed because the later layers
 | 
				
			||||||
($l_2$ in the above example) have to adapt to this shifted distribution.</p>
 | 
					($l_2$ in the above example) have to adapt to this shifted distribution.</p>
 | 
				
			||||||
 | 
				
			|||||||
@ -18,7 +18,7 @@ network parameters during training.
 | 
				
			|||||||
For example, let's say there are two layers $l_1$ and $l_2$.
 | 
					For example, let's say there are two layers $l_1$ and $l_2$.
 | 
				
			||||||
During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 | 
					During the beginning of the training $l_1$ outputs (inputs to $l_2$)
 | 
				
			||||||
could be in distribution $\mathcal{N}(0.5, 1)$.
 | 
					could be in distribution $\mathcal{N}(0.5, 1)$.
 | 
				
			||||||
Then, after some training steps, it could move to $\mathcal{N}(0.5, 1)$.
 | 
					Then, after some training steps, it could move to $\mathcal{N}(0.6, 1.5)$.
 | 
				
			||||||
This is *internal covariate shift*.
 | 
					This is *internal covariate shift*.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Internal covariate shift will adversely affect training speed because the later layers
 | 
					Internal covariate shift will adversely affect training speed because the later layers
 | 
				
			||||||
 | 
				
			|||||||
		Reference in New Issue
	
	Block a user