📚 group norm improvements

This commit is contained in:
Varuna Jayasiri
2021-04-24 15:17:14 +05:30
parent 39b9826646
commit fcc4994b3a

View File

@ -15,60 +15,6 @@ This is based on the observation that classical features such as
The paper proposes dividing feature channels into groups and then separately normalizing
all channels within each group.
## Formulation
All normalization layers can be defined by the following computation.
$$\hat{x}_i = \frac{1}{\sigma_i} (x_i - \mu_i)$$
where $x$ is the tensor representing the batch,
and $i$ is the index of a single value.
For instance, when it's 2D images
$i = (i_N, i_C, i_H, i_W)$ is a 4-d vector for indexing
image within batch, feature channel, vertical coordinate and horizontal coordinate.
$\mu_i$ and $\sigma_i$ are mean and standard deviation.
\begin{align}
\mu_i &= \frac{1}{m} \sum_{k \in \mathcal{S}_i} x_k \\
\sigma_i &= \sqrt{\frac{1}{m} \sum_{k \in \mathcal{S}_i} (x_k - \mu_i)^2 + \epsilon}
\end{align}
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
are calculated for index $i$.
$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
The definition of $\mathcal{S}_i$ is different for
[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),
[Layer normalization](https://nn.labml.ai/normalization/layer_norm/index.html), and
[Instance normalization](https://nn.labml.ai/normalization/instance_norm/index.html).
### [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html)
$$\mathcal{S}_i = \{k | k_C = i_C\}$$
The values that share the same feature channel are normalized together.
### [Layer Normalization](https://nn.labml.ai/normalization/layer_norm/index.html)
$$\mathcal{S}_i = \{k | k_N = i_N\}$$
The values from the same sample in the batch are normalized together.
### [Instance Normalization](https://nn.labml.ai/normalization/instance_norm/index.html)
$$\mathcal{S}_i = \{k | k_N = i_N, k_C = i_C\}$$
The values from the same sample and same feature channel are normalized together.
### Group Normalization
$$\mathcal{S}_i = \{k | k_N = i_N,
\bigg \lfloor \frac{k_C}{C/G} \bigg \rfloor = \bigg \lfloor \frac{i_C}{C/G} \bigg \rfloor\}$$
where $G$ is the number of groups and $C$ is the number of channels.
Group normalization normalizes values of the same sample and the same group of channels together.
Here's a [CIFAR 10 classification model](https://nn.labml.ai/normalization/group_norm/experiment.html) that uses instance normalization.
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/normalization/group_norm/experiment.ipynb)