mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 08:41:23 +08:00
✍️ typos
This commit is contained in:
@ -16,13 +16,13 @@ This is a [PyTorch](https://pytorch.org) implementation of
|
||||
* Tricky for RNNs. Do you need different normalizations for each step?
|
||||
* Doesn't work with small batch sizes;
|
||||
large NLP models are usually trained with small batch sizes.
|
||||
* Need to compute means and variances across devices in distributed training
|
||||
* Need to compute means and variances across devices in distributed training.
|
||||
|
||||
## Layer Normalization
|
||||
|
||||
Layer normalization is a simpler normalization method that works
|
||||
on a wider range of settings.
|
||||
Layer normalization transformers the inputs to have zero mean and unit variance
|
||||
Layer normalization transforms the inputs to have zero mean and unit variance
|
||||
across the features.
|
||||
*Note that batch normalization fixes the zero mean and unit variance for each element.*
|
||||
Layer normalization does it for each batch across all elements.
|
||||
|
@ -9,13 +9,13 @@ This is a [PyTorch](https://pytorch.org) implementation of
|
||||
* Tricky for RNNs. Do you need different normalizations for each step?
|
||||
* Doesn't work with small batch sizes;
|
||||
large NLP models are usually trained with small batch sizes.
|
||||
* Need to compute means and variances across devices in distributed training
|
||||
* Need to compute means and variances across devices in distributed training.
|
||||
|
||||
## Layer Normalization
|
||||
|
||||
Layer normalization is a simpler normalization method that works
|
||||
on a wider range of settings.
|
||||
Layer normalization transformers the inputs to have zero mean and unit variance
|
||||
Layer normalization transforms the inputs to have zero mean and unit variance
|
||||
across the features.
|
||||
*Note that batch normalization fixes the zero mean and unit variance for each element.*
|
||||
Layer normalization does it for each batch across all elements.
|
||||
|
Reference in New Issue
Block a user