mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-26 08:41:23 +08:00
📚 group norm improvements
This commit is contained in:
@ -15,60 +15,6 @@ This is based on the observation that classical features such as
|
||||
The paper proposes dividing feature channels into groups and then separately normalizing
|
||||
all channels within each group.
|
||||
|
||||
## Formulation
|
||||
|
||||
All normalization layers can be defined by the following computation.
|
||||
|
||||
$$\hat{x}_i = \frac{1}{\sigma_i} (x_i - \mu_i)$$
|
||||
|
||||
where $x$ is the tensor representing the batch,
|
||||
and $i$ is the index of a single value.
|
||||
For instance, when it's 2D images
|
||||
$i = (i_N, i_C, i_H, i_W)$ is a 4-d vector for indexing
|
||||
image within batch, feature channel, vertical coordinate and horizontal coordinate.
|
||||
$\mu_i$ and $\sigma_i$ are mean and standard deviation.
|
||||
|
||||
\begin{align}
|
||||
\mu_i &= \frac{1}{m} \sum_{k \in \mathcal{S}_i} x_k \\
|
||||
\sigma_i &= \sqrt{\frac{1}{m} \sum_{k \in \mathcal{S}_i} (x_k - \mu_i)^2 + \epsilon}
|
||||
\end{align}
|
||||
|
||||
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
||||
are calculated for index $i$.
|
||||
$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
|
||||
|
||||
The definition of $\mathcal{S}_i$ is different for
|
||||
[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),
|
||||
[Layer normalization](https://nn.labml.ai/normalization/layer_norm/index.html), and
|
||||
[Instance normalization](https://nn.labml.ai/normalization/instance_norm/index.html).
|
||||
|
||||
### [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html)
|
||||
|
||||
$$\mathcal{S}_i = \{k | k_C = i_C\}$$
|
||||
|
||||
The values that share the same feature channel are normalized together.
|
||||
|
||||
### [Layer Normalization](https://nn.labml.ai/normalization/layer_norm/index.html)
|
||||
|
||||
$$\mathcal{S}_i = \{k | k_N = i_N\}$$
|
||||
|
||||
The values from the same sample in the batch are normalized together.
|
||||
|
||||
### [Instance Normalization](https://nn.labml.ai/normalization/instance_norm/index.html)
|
||||
|
||||
$$\mathcal{S}_i = \{k | k_N = i_N, k_C = i_C\}$$
|
||||
|
||||
The values from the same sample and same feature channel are normalized together.
|
||||
|
||||
### Group Normalization
|
||||
|
||||
$$\mathcal{S}_i = \{k | k_N = i_N,
|
||||
\bigg \lfloor \frac{k_C}{C/G} \bigg \rfloor = \bigg \lfloor \frac{i_C}{C/G} \bigg \rfloor\}$$
|
||||
|
||||
where $G$ is the number of groups and $C$ is the number of channels.
|
||||
|
||||
Group normalization normalizes values of the same sample and the same group of channels together.
|
||||
|
||||
Here's a [CIFAR 10 classification model](https://nn.labml.ai/normalization/group_norm/experiment.html) that uses instance normalization.
|
||||
|
||||
[](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/normalization/group_norm/experiment.ipynb)
|
||||
|
Reference in New Issue
Block a user