This is an implementation of Generative Adversarial Networks.
The generator, $G(\pmb{z}; \theta_g)$ generates samples that match the distribution of data, while the discriminator, $D(\pmb{x}; \theta_g)$ gives the probability that $\pmb{x}$ came from data rather than $G$.
We train $D$ and $G$ simultaneously on a two-player min-max game with value function $V(G, D)$.
$p_{data}(\pmb{x})$ is the probability distribution over data, whilst $p_{\pmb{z}}(\pmb{z})$ probability distribution of $\pmb{z}$, which is set to gaussian noise.
This file defines the loss functions. Here is an MNIST example with two multilayer perceptron for the generator and discriminator.
34import torch
35import torch.nn as nn
36import torch.utils.data
37import torch.utils.data
38
39from labml_helpers.module import Module
Discriminator should ascend on the gradient,
$m$ is the mini-batch size and $(i)$ is used to index samples in the mini-batch. $\pmb{x}$ are samples from $p_{data}$ and $\pmb{z}$ are samples from $p_z$.
42class DiscriminatorLogitsLoss(Module):
57 def __init__(self, smoothing: float = 0.2):
58 super().__init__()
We use PyTorch Binary Cross Entropy Loss, which is $-\sum\Big[y \log(\hat{y}) + (1 - y) \log(1 - \hat{y})\Big]$, where $y$ are the labels and $\hat{y}$ are the predictions. Note the negative sign. We use labels equal to $1$ for $\pmb{x}$ from $p_{data}$ and labels equal to $0$ for $\pmb{x}$ from $p_{G}.$ Then descending on the sum of these is the same as ascending on the above gradient.
BCEWithLogitsLoss
combines softmax and binary cross entropy loss.
69 self.loss_true = nn.BCEWithLogitsLoss()
70 self.loss_false = nn.BCEWithLogitsLoss()
We use label smoothing because it seems to work better in some cases
73 self.smoothing = smoothing
Labels are registered as buffered and persistence is set to False
.
76 self.register_buffer('labels_true', _create_labels(256, 1.0 - smoothing, 1.0), False)
77 self.register_buffer('labels_false', _create_labels(256, 0.0, smoothing), False)
logits_true
are logits from $D(\pmb{x}^{(i)})$ and
logits_false
are logits from $D(G(\pmb{z}^{(i)}))$
79 def __call__(self, logits_true: torch.Tensor, logits_false: torch.Tensor):
84 if len(logits_true) > len(self.labels_true):
85 self.register_buffer("labels_true",
86 _create_labels(len(logits_true), 1.0 - self.smoothing, 1.0, logits_true.device), False)
87 if len(logits_false) > len(self.labels_false):
88 self.register_buffer("labels_false",
89 _create_labels(len(logits_false), 0.0, self.smoothing, logits_false.device), False)
90
91 return (self.loss_true(logits_true, self.labels_true[:len(logits_true)]),
92 self.loss_false(logits_false, self.labels_false[:len(logits_false)]))
95class GeneratorLogitsLoss(Module):
105 def __init__(self, smoothing: float = 0.2):
106 super().__init__()
107 self.loss_true = nn.BCEWithLogitsLoss()
108 self.smoothing = smoothing
We use labels equal to $1$ for $\pmb{x}$ from $p_{G}.$ Then descending on this loss is the same as descending on the above gradient.
112 self.register_buffer('fake_labels', _create_labels(256, 1.0 - smoothing, 1.0), False)
114 def __call__(self, logits: torch.Tensor):
115 if len(logits) > len(self.fake_labels):
116 self.register_buffer("fake_labels",
117 _create_labels(len(logits), 1.0 - self.smoothing, 1.0, logits.device), False)
118
119 return self.loss_true(logits, self.fake_labels[:len(logits)])
Create smoothed labels
122def _create_labels(n: int, r1: float, r2: float, device: torch.device = None):
126 return torch.empty(n, 1, requires_grad=False, device=device).uniform_(r1, r2)