diff --git a/docs/normalization/group_norm/index.html b/docs/normalization/group_norm/index.html
index 6779f7b6..eefd7cb8 100644
--- a/docs/normalization/group_norm/index.html
+++ b/docs/normalization/group_norm/index.html
@@ -74,9 +74,9 @@
                 </div>
                 <h1>Group Normalization</h1>
 <p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
-the paper <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a>.</p>
-<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for sufficiently large batch sizes,
-but does not perform well for small batch sizes, because it normalizes across the batch.
+the <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a> paper.</p>
+<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for large enough batch sizes
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.</p>
 <p>This paper introduces Group Normalization, which normalizes a set of features together as a group.
@@ -104,7 +104,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.</p>
 </p>
 <p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.</p>
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.</p>
 <p>The definition of $\mathcal{S}_i$ is different for
 <a href="../batch_norm/index.html">Batch normalization</a>,
 <a href="../layer_norm/index.html">Layer normalization</a>, and
diff --git a/labml_nn/normalization/group_norm/__init__.py b/labml_nn/normalization/group_norm/__init__.py
index 14968381..a5e9c310 100644
--- a/labml_nn/normalization/group_norm/__init__.py
+++ b/labml_nn/normalization/group_norm/__init__.py
@@ -8,10 +8,10 @@ summary: >
 # Group Normalization
 
 This is a [PyTorch](https://pytorch.org) implementation of
-the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
+the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
 
-[Batch Normalization](../batch_norm/index.html) works well for sufficiently large batch sizes,
-but does not perform well for small batch sizes, because it normalizes across the batch.
+[Batch Normalization](../batch_norm/index.html) works well for large enough batch sizes
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.
 
@@ -42,7 +42,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
 
 $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
 
 The definition of $\mathcal{S}_i$ is different for
 [Batch normalization](../batch_norm/index.html),
diff --git a/labml_nn/normalization/group_norm/readme.md b/labml_nn/normalization/group_norm/readme.md
index 802f9dcc..57684bc2 100644
--- a/labml_nn/normalization/group_norm/readme.md
+++ b/labml_nn/normalization/group_norm/readme.md
@@ -1,10 +1,10 @@
 # [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
 
 This is a [PyTorch](https://pytorch.org) implementation of
-the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
+the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
 
-[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for sufficiently large batch sizes,
-but does not perform well for small batch sizes, because it normalizes across the batch.
+[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for large enough batch sizes
+but not well for small batch sizes, because it normalizes over the batch.
 Training large models with large batch sizes is not possible due to the memory capacity of the
 devices.
 
@@ -35,7 +35,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
 
 $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
 are calculated for index $i$.
-$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
+$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
 
 The definition of $\mathcal{S}_i$ is different for
 [Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),