📚 group norm improvements

This commit is contained in:
Varuna Jayasiri
2021-04-24 15:14:07 +05:30
parent 1a7f9c0816
commit 39b9826646
3 changed files with 12 additions and 12 deletions

View File

@ -74,9 +74,9 @@
</div> </div>
<h1>Group Normalization</h1> <h1>Group Normalization</h1>
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of <p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
the paper <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a>.</p> the <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a> paper.</p>
<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for sufficiently large batch sizes, <p><a href="../batch_norm/index.html">Batch Normalization</a> works well for large enough batch sizes
but does not perform well for small batch sizes, because it normalizes across the batch. but not well for small batch sizes, because it normalizes over the batch.
Training large models with large batch sizes is not possible due to the memory capacity of the Training large models with large batch sizes is not possible due to the memory capacity of the
devices.</p> devices.</p>
<p>This paper introduces Group Normalization, which normalizes a set of features together as a group. <p>This paper introduces Group Normalization, which normalizes a set of features together as a group.
@ -104,7 +104,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.</p>
</p> </p>
<p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation <p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
are calculated for index $i$. are calculated for index $i$.
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.</p> $m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.</p>
<p>The definition of $\mathcal{S}_i$ is different for <p>The definition of $\mathcal{S}_i$ is different for
<a href="../batch_norm/index.html">Batch normalization</a>, <a href="../batch_norm/index.html">Batch normalization</a>,
<a href="../layer_norm/index.html">Layer normalization</a>, and <a href="../layer_norm/index.html">Layer normalization</a>, and

View File

@ -8,10 +8,10 @@ summary: >
# Group Normalization # Group Normalization
This is a [PyTorch](https://pytorch.org) implementation of This is a [PyTorch](https://pytorch.org) implementation of
the paper [Group Normalization](https://arxiv.org/abs/1803.08494). the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
[Batch Normalization](../batch_norm/index.html) works well for sufficiently large batch sizes, [Batch Normalization](../batch_norm/index.html) works well for large enough batch sizes
but does not perform well for small batch sizes, because it normalizes across the batch. but not well for small batch sizes, because it normalizes over the batch.
Training large models with large batch sizes is not possible due to the memory capacity of the Training large models with large batch sizes is not possible due to the memory capacity of the
devices. devices.
@ -42,7 +42,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
are calculated for index $i$. are calculated for index $i$.
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$. $m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
The definition of $\mathcal{S}_i$ is different for The definition of $\mathcal{S}_i$ is different for
[Batch normalization](../batch_norm/index.html), [Batch normalization](../batch_norm/index.html),

View File

@ -1,10 +1,10 @@
# [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html) # [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
This is a [PyTorch](https://pytorch.org) implementation of This is a [PyTorch](https://pytorch.org) implementation of
the paper [Group Normalization](https://arxiv.org/abs/1803.08494). the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for sufficiently large batch sizes, [Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for large enough batch sizes
but does not perform well for small batch sizes, because it normalizes across the batch. but not well for small batch sizes, because it normalizes over the batch.
Training large models with large batch sizes is not possible due to the memory capacity of the Training large models with large batch sizes is not possible due to the memory capacity of the
devices. devices.
@ -35,7 +35,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation $\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
are calculated for index $i$. are calculated for index $i$.
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$. $m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
The definition of $\mathcal{S}_i$ is different for The definition of $\mathcal{S}_i$ is different for
[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html), [Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),