📚 group norm improvements

2025-08-14 09:31:42 +08:00 · 2021-04-24 15:14:07 +05:30
parent 1a7f9c0816
commit 39b9826646
3 changed files with 12 additions and 12 deletions
--- a/docs/normalization/group_norm/index.html
+++ b/docs/normalization/group_norm/index.html
@ -74,9 +74,9 @@
                </div>
                <h1>Group Normalization</h1>
 <p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
-the paper <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a>.</p>
+the <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a> paper.</p>
-<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for sufficiently large batch sizes,
+<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for large enough batch sizes
-but does not perform well for small batch sizes, because it normalizes across the batch.
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.</p>
 <p>This paper introduces Group Normalization, which normalizes a set of features together as a group.
@ -104,7 +104,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.</p>
 </p>
 <p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.</p>
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.</p>
 <p>The definition of $\mathcal{S}_i$ is different for
 <a href="../batch_norm/index.html">Batch normalization</a>,
 <a href="../layer_norm/index.html">Layer normalization</a>, and
--- a/labml_nn/normalization/group_norm/init.py
+++ b/labml_nn/normalization/group_norm/init.py
@ -8,10 +8,10 @@ summary: >
 # Group Normalization
 This is a [PyTorch](https://pytorch.org) implementation of
-the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
+the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
-[Batch Normalization](../batch_norm/index.html) works well for sufficiently large batch sizes,
+[Batch Normalization](../batch_norm/index.html) works well for large enough batch sizes
-but does not perform well for small batch sizes, because it normalizes across the batch.
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.
@ -42,7 +42,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
 $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
 The definition of $\mathcal{S}_i$ is different for
 [Batch normalization](../batch_norm/index.html),
--- a/labml_nn/normalization/group_norm/readme.md
+++ b/labml_nn/normalization/group_norm/readme.md
@ -1,10 +1,10 @@
 # [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
 This is a [PyTorch](https://pytorch.org) implementation of
-the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
+the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
-[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for sufficiently large batch sizes,
+[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for large enough batch sizes
-but does not perform well for small batch sizes, because it normalizes across the batch.
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.
@ -35,7 +35,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
 $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
 The definition of $\mathcal{S}_i$ is different for
 [Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),