mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-14 09:31:42 +08:00
📚 group norm improvements
This commit is contained in:
@ -74,9 +74,9 @@
|
|||||||
</div>
|
</div>
|
||||||
<h1>Group Normalization</h1>
|
<h1>Group Normalization</h1>
|
||||||
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
|
<p>This is a <a href="https://pytorch.org">PyTorch</a> implementation of
|
||||||
the paper <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a>.</p>
|
the <a href="https://arxiv.org/abs/1803.08494">Group Normalization</a> paper.</p>
|
||||||
<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for sufficiently large batch sizes,
|
<p><a href="../batch_norm/index.html">Batch Normalization</a> works well for large enough batch sizes
|
||||||
but does not perform well for small batch sizes, because it normalizes across the batch.
|
but not well for small batch sizes, because it normalizes over the batch.
|
||||||
Training large models with large batch sizes is not possible due to the memory capacity of the
|
Training large models with large batch sizes is not possible due to the memory capacity of the
|
||||||
devices.</p>
|
devices.</p>
|
||||||
<p>This paper introduces Group Normalization, which normalizes a set of features together as a group.
|
<p>This paper introduces Group Normalization, which normalizes a set of features together as a group.
|
||||||
@ -104,7 +104,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.</p>
|
|||||||
</p>
|
</p>
|
||||||
<p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
<p>$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
||||||
are calculated for index $i$.
|
are calculated for index $i$.
|
||||||
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.</p>
|
$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.</p>
|
||||||
<p>The definition of $\mathcal{S}_i$ is different for
|
<p>The definition of $\mathcal{S}_i$ is different for
|
||||||
<a href="../batch_norm/index.html">Batch normalization</a>,
|
<a href="../batch_norm/index.html">Batch normalization</a>,
|
||||||
<a href="../layer_norm/index.html">Layer normalization</a>, and
|
<a href="../layer_norm/index.html">Layer normalization</a>, and
|
||||||
|
@ -8,10 +8,10 @@ summary: >
|
|||||||
# Group Normalization
|
# Group Normalization
|
||||||
|
|
||||||
This is a [PyTorch](https://pytorch.org) implementation of
|
This is a [PyTorch](https://pytorch.org) implementation of
|
||||||
the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
|
the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
|
||||||
|
|
||||||
[Batch Normalization](../batch_norm/index.html) works well for sufficiently large batch sizes,
|
[Batch Normalization](../batch_norm/index.html) works well for large enough batch sizes
|
||||||
but does not perform well for small batch sizes, because it normalizes across the batch.
|
but not well for small batch sizes, because it normalizes over the batch.
|
||||||
Training large models with large batch sizes is not possible due to the memory capacity of the
|
Training large models with large batch sizes is not possible due to the memory capacity of the
|
||||||
devices.
|
devices.
|
||||||
|
|
||||||
@ -42,7 +42,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
|
|||||||
|
|
||||||
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
||||||
are calculated for index $i$.
|
are calculated for index $i$.
|
||||||
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
|
$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
|
||||||
|
|
||||||
The definition of $\mathcal{S}_i$ is different for
|
The definition of $\mathcal{S}_i$ is different for
|
||||||
[Batch normalization](../batch_norm/index.html),
|
[Batch normalization](../batch_norm/index.html),
|
||||||
|
@ -1,10 +1,10 @@
|
|||||||
# [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
|
# [Group Normalization](https://nn.labml.ai/normalization/group_norm/index.html)
|
||||||
|
|
||||||
This is a [PyTorch](https://pytorch.org) implementation of
|
This is a [PyTorch](https://pytorch.org) implementation of
|
||||||
the paper [Group Normalization](https://arxiv.org/abs/1803.08494).
|
the [Group Normalization](https://arxiv.org/abs/1803.08494) paper.
|
||||||
|
|
||||||
[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for sufficiently large batch sizes,
|
[Batch Normalization](https://nn.labml.ai/normalization/batch_norm/index.html) works well for large enough batch sizes
|
||||||
but does not perform well for small batch sizes, because it normalizes across the batch.
|
but not well for small batch sizes, because it normalizes over the batch.
|
||||||
Training large models with large batch sizes is not possible due to the memory capacity of the
|
Training large models with large batch sizes is not possible due to the memory capacity of the
|
||||||
devices.
|
devices.
|
||||||
|
|
||||||
@ -35,7 +35,7 @@ $\mu_i$ and $\sigma_i$ are mean and standard deviation.
|
|||||||
|
|
||||||
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
$\mathcal{S}_i$ is the set of indexes across which the mean and standard deviation
|
||||||
are calculated for index $i$.
|
are calculated for index $i$.
|
||||||
$m$ is the size of the set $\mathcal{S}_i$ which is same for all $i$.
|
$m$ is the size of the set $\mathcal{S}_i$ which is the same for all $i$.
|
||||||
|
|
||||||
The definition of $\mathcal{S}_i$ is different for
|
The definition of $\mathcal{S}_i$ is different for
|
||||||
[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),
|
[Batch normalization](https://nn.labml.ai/normalization/batch_norm/index.html),
|
||||||
|
Reference in New Issue
Block a user