diff --git a/labml_nn/normalization/group_norm/readme.md b/labml_nn/normalization/group_norm/readme.md index 57684bc2..06afdb0d 100644 --- a/labml_nn/normalization/group_norm/readme.md +++ b/labml_nn/normalization/group_norm/readme.md @@ -15,60 +15,6 @@ This is based on the observation that classical features such as The paper proposes dividing feature channels into groups and then separately normalizing all channels within each group. -## Formulation - -All normalization layers can be defined by the following computation. - -$$\hat{x}_i = \frac{1}{\sigma_i} (x_i - \mu_i)$$ - -where $x$ is the tensor representing the batch, -and $i$ is the index of a single value. -For instance, when it's 2D images -$i = (i_N, i_C, i_H, i_W)$ is a 4-d vector for indexing -image within batch, feature channel, vertical coordinate and horizontal coordinate. -$\mu_i$ and $\sigma_i$ are mean and standard deviation. - -\begin{align} -\mu_i &= \frac{1}{m} \sum_{k \in \mathcal{S}_i} x_k \\ -\sigma_i &= \sqrt{\frac{1}{m} \sum_{k \in \mathcal{S}_i} (x_k - \mu_i)^2 + \epsilon} -\end{align} - -$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation -are calculated for index $i$. -$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$. - -The definition of $\mathcal{S}_i$ is different for -[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html), -[Layer normalization](https://nn.labml.ai/normalization/layer_norm/index.html), and -[Instance normalization](https://nn.labml.ai/normalization/instance_norm/index.html). - -### [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) - -$$\mathcal{S}_i = \{k | k_C = i_C\}$$ - -The values that share the same feature channel are normalized together. - -### [Layer Normalization](https://nn.labml.ai/normalization/layer_norm/index.html) - -$$\mathcal{S}_i = \{k | k_N = i_N\}$$ - -The values from the same sample in the batch are normalized together. - -### [Instance Normalization](https://nn.labml.ai/normalization/instance_norm/index.html) - -$$\mathcal{S}_i = \{k | k_N = i_N, k_C = i_C\}$$ - -The values from the same sample and same feature channel are normalized together. - -### Group Normalization - -$$\mathcal{S}_i = \{k | k_N = i_N, - \bigg \lfloor \frac{k_C}{C/G} \bigg \rfloor = \bigg \lfloor \frac{i_C}{C/G} \bigg \rfloor\}$$ - -where $G$ is the number of groups and $C$ is the number of channels. - -Group normalization normalizes values of the same sample and the same group of channels together. - Here's a [CIFAR 10 classification model](https://nn.labml.ai/normalization/group_norm/experiment.html) that uses instance normalization. [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/normalization/group_norm/experiment.ipynb)