mirror of
https://github.com/labmlai/annotated_deep_learning_paper_implementations.git
synced 2025-08-14 09:31:42 +08:00
1603 lines
93 KiB
HTML
1603 lines
93 KiB
HTML
<!DOCTYPE html>
|
|
<html>
|
|
<head>
|
|
<meta http-equiv="content-type" content="text/html;charset=utf-8"/>
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0"/>
|
|
<meta name="description" content="This is an annotated implementation/tutorial of Regret Minimization in Games with Incomplete Information"/>
|
|
|
|
<meta name="twitter:card" content="summary"/>
|
|
<meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&v=4"/>
|
|
<meta name="twitter:title" content="Regret Minimization in Games with Incomplete Information (CFR)"/>
|
|
<meta name="twitter:description" content="This is an annotated implementation/tutorial of Regret Minimization in Games with Incomplete Information"/>
|
|
<meta name="twitter:site" content="@labmlai"/>
|
|
<meta name="twitter:creator" content="@labmlai"/>
|
|
|
|
<meta property="og:url" content="https://nn.labml.ai/cfr/index.html"/>
|
|
<meta property="og:title" content="Regret Minimization in Games with Incomplete Information (CFR)"/>
|
|
<meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&v=4"/>
|
|
<meta property="og:site_name" content="LabML Neural Networks"/>
|
|
<meta property="og:type" content="object"/>
|
|
<meta property="og:title" content="Regret Minimization in Games with Incomplete Information (CFR)"/>
|
|
<meta property="og:description" content="This is an annotated implementation/tutorial of Regret Minimization in Games with Incomplete Information"/>
|
|
|
|
<title>Regret Minimization in Games with Incomplete Information (CFR)</title>
|
|
<link rel="shortcut icon" href="/icon.png"/>
|
|
<link rel="stylesheet" href="../pylit.css">
|
|
<link rel="canonical" href="https://nn.labml.ai/cfr/index.html"/>
|
|
<!-- Global site tag (gtag.js) - Google Analytics -->
|
|
<script async src="https://www.googletagmanager.com/gtag/js?id=G-4V3HC8HBLH"></script>
|
|
<script>
|
|
window.dataLayer = window.dataLayer || [];
|
|
|
|
function gtag() {
|
|
dataLayer.push(arguments);
|
|
}
|
|
|
|
gtag('js', new Date());
|
|
|
|
gtag('config', 'G-4V3HC8HBLH');
|
|
</script>
|
|
</head>
|
|
<body>
|
|
<div id='container'>
|
|
<div id="background"></div>
|
|
<div class='section'>
|
|
<div class='docs'>
|
|
<p>
|
|
<a class="parent" href="/">home</a>
|
|
<a class="parent" href="index.html">cfr</a>
|
|
</p>
|
|
<p>
|
|
|
|
<a href="https://github.com/lab-ml/labml_nn/tree/master/labml_nn/cfr/__init__.py">
|
|
<img alt="Github"
|
|
src="https://img.shields.io/github/stars/lab-ml/nn?style=social"
|
|
style="max-width:100%;"/></a>
|
|
<a href="https://twitter.com/labmlai"
|
|
rel="nofollow">
|
|
<img alt="Twitter"
|
|
src="https://img.shields.io/twitter/follow/labmlai?style=social"
|
|
style="max-width:100%;"/></a>
|
|
</p>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-0'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-0'>#</a>
|
|
</div>
|
|
<h1>Regret Minimization in Games with Incomplete Information (CFR)</h1>
|
|
<p>The paper
|
|
<a href="http://martin.zinkevich.org/publications/regretpoker.pdf">Regret Minimization in Games with Incomplete Information</a>
|
|
introduces counterfactual regret and how minimizing counterfactual regret through self-play
|
|
can be used to reach Nash equilibrium.
|
|
The algorithm is called Counterfactual Regret Minimization (<strong>CFR</strong>).</p>
|
|
<p>The paper
|
|
<a href="http://mlanctot.info/files/papers/nips09mccfr.pdf">Monte Carlo Sampling for Regret Minimization in Extensive Games</a>
|
|
introduces Monte Carlo Counterfactual Regret Minimization (<strong>MCCFR</strong>),
|
|
where we sample from the game tree and estimate the regrets.</p>
|
|
<p>We tried to keep our Python implementation easy-to-understand like a tutorial.
|
|
We run it on <a href="kuhn/index.html">a very simple imperfect information game called Kuhn poker</a>.</p>
|
|
<p><a href="https://colab.research.google.com/github/lab-ml/nn/blob/master/labml_nn/cfr/kuhn/experiment.ipynb"><img alt="Open In Colab" src="https://colab.research.google.com/assets/colab-badge.svg" /></a></p>
|
|
<p><a href="https://twitter.com/labmlai/status/1407186002255380484"><img alt="Twitter thread" src="https://img.shields.io/twitter/url?style=social&url=https%3A%2F%2Ftwitter.com%2Flabmlai%2Fstatus%2F1407186002255380484" /></a>
|
|
Twitter thread</p>
|
|
<h2>Introduction</h2>
|
|
<p>We implement Monte Carlo Counterfactual Regret Minimization (MCCFR) with chance sampling (CS).
|
|
It iteratively, explores part of the game tree by trying all player actions,
|
|
but sampling chance events.
|
|
Chance events are things like dealing cards; they are kept sampled once per iteration.
|
|
Then it calculates, for each action, the <em>regret</em> of following the current strategy instead of taking that action.
|
|
Then it updates the strategy based on these regrets for the next iteration, using regret matching.
|
|
Finally, it computes the average of the strategies throughout the iterations,
|
|
which is very close to the Nash equilibrium if we ran enough iterations.</p>
|
|
<p>We will first introduce the mathematical notation and theory.</p>
|
|
<h3>Player</h3>
|
|
<p>A player is denoted by $i \in N$, where $N$ is the set of players.</p>
|
|
<h3><a href="#History">History</a></h3>
|
|
<p>History $h \in H$ is a sequence of actions including chance events,
|
|
and $H$ is the set of all histories.</p>
|
|
<p>$Z \subseteq H$ is the set of terminal histories (game over).</p>
|
|
<h3>Action</h3>
|
|
<p>Action $a$, $A(h) = {a: (h, a) \in H}$ where $h \in H$ is a non-terminal <a href="#History">history</a>.</p>
|
|
<h3><a href="#InfoSet">Information Set $I_i$</a></h3>
|
|
<p><strong>Information set</strong> $I_i \in \mathcal{I}_i$ for player $i$
|
|
is similar to a history $h \in H$
|
|
but only contains the actions visible to player $i$.
|
|
That is, the history $h$ will contain actions/events such as cards dealt to the
|
|
opposing player while $I_i$ will not have them.</p>
|
|
<p>$\mathcal{I}_i$ is known as the <strong>information partition</strong> of player $i$.</p>
|
|
<p>$h \in I$ is the set of all histories that belong to a given information set;
|
|
i.e. all those histories look the same in the eye of the player.</p>
|
|
<p><a id="Strategy"></a></p>
|
|
<h3>Strategy</h3>
|
|
<p><strong>Strategy of player</strong> $i$, $\sigma_i \in \Sigma_i$ is a distribution over actions $A(I_i)$,
|
|
where $\Sigma_i$ is the set of all strategies for player $i$.
|
|
Strategy on $t$-th iteration is denoted by $\sigma^t_i$.</p>
|
|
<p>Strategy is defined as a probability for taking an action $a$ in for a given information set $I$,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\sigma_i(I)(a)</script>
|
|
</p>
|
|
<p>$\sigma$ is the <strong>strategy profile</strong> which consists of strategies of all players
|
|
$\sigma_1, \sigma_2, \ldots$</p>
|
|
<p>$\sigma_{-i}$ is strategies of all players except $\sigma_i$</p>
|
|
<p><a id="HistoryProbability"></a></p>
|
|
<h3>Probability of History</h3>
|
|
<p>$\pi^\sigma(h)$ is the probability of reaching the history $h$ with strategy profile $\sigma$.
|
|
$\pi^\sigma(h)_{-i}$ is the probability of reaching $h$ without player $i$’s contribution;
|
|
i.e. player $i$ took the actions to follow $h$ with a probability of $1$.</p>
|
|
<p>$\pi^\sigma(h)_{i}$ is the probability of reaching $h$ with only player $i$’s contribution.
|
|
That is,
|
|
<script type="math/tex; mode=display">\pi^\sigma(h) = \pi^\sigma(h)_{i} \pi^\sigma(h)_{-i}</script>
|
|
</p>
|
|
<p>Probability of reaching a information set $I$ is,
|
|
<script type="math/tex; mode=display">\pi^\sigma(I) = \sum_{h \in I} \pi^\sigma(h)</script>
|
|
</p>
|
|
<h3>Utility (Pay off)</h3>
|
|
<p>The <a href="#terminal_utility">terminal utility</a> is the utility (or pay off)
|
|
of a player $i$ for a terminal history $h$.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">u_i(h)</script> where $h \in Z$</p>
|
|
<p>$u_i(\sigma)$ is the expected utility (payoff) for player $i$ with strategy profile $\sigma$.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">u_i(\sigma) = \sum_{h \in Z} u_i(h) \pi^\sigma(h)</script>
|
|
</p>
|
|
<p><a id="NashEquilibrium"></a></p>
|
|
<h3>Nash Equilibrium</h3>
|
|
<p>Nash equilibrium is a state where none of the players can increase their expected utility (or payoff)
|
|
by changing their strategy alone.</p>
|
|
<p>For two players, Nash equilibrium is a <a href="#Strategy">strategy profile</a> where</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
u_1(\sigma) &\ge \max_{\sigma'_1 \in \Sigma_1} u_1(\sigma'_1, \sigma_2) \\
|
|
u_2(\sigma) &\ge \max_{\sigma'_2 \in \Sigma_2} u_1(\sigma_1, \sigma'_2) \\
|
|
\end{align}</script>
|
|
</p>
|
|
<p>$\epsilon$-Nash equilibrium is,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
u_1(\sigma) + \epsilon &\ge \max_{\sigma'_1 \in \Sigma_1} u_1(\sigma'_1, \sigma_2) \\
|
|
u_2(\sigma) + \epsilon &\ge \max_{\sigma'_2 \in \Sigma_2} u_1(\sigma_1, \sigma'_2) \\
|
|
\end{align}</script>
|
|
</p>
|
|
<h3>Regret Minimization</h3>
|
|
<p>Regret is the utility (or pay off) that the player didn’t get because
|
|
she didn’t follow the optimal strategy or took the best action.</p>
|
|
<p>Average overall regret for Player $i$ is the average regret of not following the
|
|
optimal strategy in all $T$ rounds of iterations.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">R^T_i = \frac{1}{T} \max_{\sigma^*_i \in \Sigma_i} \sum_{t=1}^T
|
|
\Big( u_i(\sigma^*_i, \sigma^t_{-i}) - u_i(\sigma^t) \Big)</script>
|
|
</p>
|
|
<p>where $\sigma^t$ is the strategy profile of all players in iteration $t$,
|
|
and</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">(\sigma^*_i, \sigma^t_{-i})</script>
|
|
</p>
|
|
<p>is the strategy profile $\sigma^t$ with player $i$’s strategy
|
|
replaced with $\sigma^*_i$.</p>
|
|
<p>The average strategy is the average of strategies followed in each round,
|
|
for all $I \in \mathcal{I}, a \in A(I)$</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{cyan}{\bar{\sigma}^T_i(I)(a)} =
|
|
\frac{\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}}{\sum_{t=1}^T \pi_i^{\sigma^t}(I)}</script>
|
|
</p>
|
|
<p>That is the mean regret of not playing with the optimal strategy.</p>
|
|
<p>If $R^T_i < \epsilon$ for all players then $\bar{\sigma}^T_i(I)(a)$ is a
|
|
$2\epsilon$-Nash equilibrium.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
R^T_i &< \epsilon \\
|
|
R^T_i &= \frac{1}{T} \max_{\sigma^*_i \in \Sigma_i} \sum_{t=1}^T
|
|
\Big( u_i(\sigma^*_i, \sigma^t_{-i}) - u_i(\sigma^t) \Big) \\
|
|
&= \frac{1}{T} \max_{\sigma^*_i \in \Sigma_i} \sum_{t=1}^T u_i(\sigma^*_i, \sigma^t_{-i})
|
|
- \frac{1}{T} \sum_{t=1}^T u_i(\sigma^t) < \epsilon
|
|
\end{align}</script>
|
|
</p>
|
|
<p>Since $u_1 = -u_2$ because it’s a zero-sum game, we can add $R^T_1$ and $R^T_i$ and the
|
|
second term will cancel out.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
2\epsilon &>
|
|
\frac{1}{T} \max_{\sigma^*_1 \in \Sigma_1} \sum_{t=1}^T u_1(\sigma^*_1, \sigma^t_{-1}) +
|
|
\frac{1}{T} \max_{\sigma^*_2 \in \Sigma_2} \sum_{t=1}^T u_2(\sigma^*_2, \sigma^t_{-2})
|
|
\end{align}</script>
|
|
</p>
|
|
<p>The average of utilities over a set of strategies is equal to the utility of the average strategy.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\frac{1}{T} \sum_{t=1}^T u_i(\sigma^t) = u_i(\bar{\sigma}^T)</script>
|
|
</p>
|
|
<p>Therefore,
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
2\epsilon &>
|
|
\max_{\sigma^*_1 \in \Sigma_1} u_1(\sigma^*_1, \bar{\sigma}^T_{-1}) +
|
|
\max_{\sigma^*_2 \in \Sigma_2} u_2(\sigma^*_2, \bar{\sigma}^T_{-2})
|
|
\end{align}</script>
|
|
</p>
|
|
<p>From the definition of $\max$,
|
|
<script type="math/tex; mode=display">\max_{\sigma^*_2 \in \Sigma_2} u_2(\sigma^*_2, \bar{\sigma}^T_{-2}) \ge u_2(\bar{\sigma}^T)
|
|
= -u_1(\bar{\sigma}^T)</script>
|
|
</p>
|
|
<p>Then,
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
2\epsilon &>
|
|
\max_{\sigma^*_1 \in \Sigma_1} u_1(\sigma^*_1, \bar{\sigma}^T_{-1}) +
|
|
-u_1(\bar{\sigma}^T) \\
|
|
u_1(\bar{\sigma}^T) + 2\epsilon &> \max_{\sigma^*_1 \in \Sigma_1} u_1(\sigma^*_1, \bar{\sigma}^T_{-1})
|
|
\end{align}</script>
|
|
</p>
|
|
<p>This is $2\epsilon$-Nash equilibrium.
|
|
You can similarly prove for games with more than 2 players.</p>
|
|
<p>So we need to minimize $R^T_i$ to get close to a Nash equilibrium.</p>
|
|
<p><a id="CounterfactualRegret"></a></p>
|
|
<h3>Counterfactual regret</h3>
|
|
<p><strong>Counterfactual value</strong> $\color{pink}{v_i(\sigma, I)}$ is the expected utility for player $i$ if
|
|
if player $i$ tried to reach $I$ (took the actions leading to $I$ with a probability of $1$).</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{pink}{v_i(\sigma, I)} = \sum_{z \in Z_I} \pi^\sigma_{-i}(z[I]) \pi^\sigma(z[I], z) u_i(z)</script>
|
|
</p>
|
|
<p>where $Z_I$ is the set of terminal histories reachable from $I$,
|
|
and $z[I]$ is the prefix of $z$ up to $I$.
|
|
$\pi^\sigma(z[I], z)$ is the probability of reaching z from $z[I]$.</p>
|
|
<p><strong>Immediate counterfactual regret</strong> is,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">R^T_{i,imm}(I) = \max_{a \in A{I}} R^T_{i,imm}(I, a)</script>
|
|
</p>
|
|
<p>where</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">R^T_{i,imm}(I) = \frac{1}{T} \sum_{t=1}^T
|
|
\Big(
|
|
\color{pink}{v_i(\sigma^t |_{I \rightarrow a}, I)} - \color{pink}{v_i(\sigma^t, I)}
|
|
\Big)</script>
|
|
</p>
|
|
<p>where $\sigma |_{I \rightarrow a}$ is the strategy profile $\sigma$ with the modification
|
|
of always taking action $a$ at information set $I$.</p>
|
|
<p>The <a href="http://martin.zinkevich.org/publications/regretpoker.pdf">paper</a> proves that (Theorem 3),</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">R^T_i \le \sum_{I \in \mathcal{I}} R^{T,+}_{i,imm}(I)</script>
|
|
where <script type="math/tex; mode=display">R^{T,+}_{i,imm}(I) = \max(R^T_{i,imm}(I), 0)</script>
|
|
</p>
|
|
<p><a id="RegretMatching"></a></p>
|
|
<h3>Regret Matching</h3>
|
|
<p>The strategy is calculated using regret matching.</p>
|
|
<p>The regret for each information set and action pair $\color{orange}{R^T_i(I, a)}$ is maintained,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\color{coral}{r^t_i(I, a)} &=
|
|
\color{pink}{v_i(\sigma^t |_{I \rightarrow a}, I)} - \color{pink}{v_i(\sigma^t, I)}
|
|
\\
|
|
\color{orange}{R^T_i(I, a)} &=
|
|
\frac{1}{T} \sum_{t=1}^T \color{coral}{r^t_i(I, a)}
|
|
\end{align}</script>
|
|
</p>
|
|
<p>and the strategy is calculated with regret matching,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\color{lightgreen}{\sigma_i^{T+1}(I)(a)} =
|
|
\begin{cases}
|
|
\frac{\color{orange}{R^{T,+}_i(I, a)}}{\sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')}},
|
|
& \text{if} \sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')} \gt 0 \\
|
|
\frac{1}{\lvert A(I) \rvert},
|
|
& \text{otherwise}
|
|
\end{cases}
|
|
\end{align}</script>
|
|
</p>
|
|
<p>where $\color{orange}{R^{T,+}_i(I, a)} = \max \Big(\color{orange}{R^T_i(I, a)}, 0 \Big)$</p>
|
|
<p>The paper
|
|
The paper
|
|
<a href="http://martin.zinkevich.org/publications/regretpoker.pdf">Regret Minimization in Games with Incomplete Information</a>
|
|
proves that if the strategy is selected according to above equation
|
|
$R^T_i$ gets smaller proportionate to $\frac{1}{\sqrt T}$, and
|
|
therefore reaches $\epsilon$-<a href="#NashEquilibrium">Nash equilibrium</a>.</p>
|
|
<p><a id="MCCFR"></a></p>
|
|
<h3>Monte Carlo CFR (MCCFR)</h3>
|
|
<p>Computing $\color{coral}{r^t_i(I, a)}$ requires expanding the full game tree
|
|
on each iteration.</p>
|
|
<p>The paper
|
|
<a href="http://mlanctot.info/files/papers/nips09mccfr.pdf">Monte Carlo Sampling for Regret Minimization in Extensive Games</a>
|
|
shows we can sample from the game tree and estimate the regrets.</p>
|
|
<p>$\mathcal{Q} = {Q_1, \ldots, Q_r}$ is a set of subsets of $Z$ ($Q_j \subseteq Z$) where
|
|
we look at only a single block $Q_j$ in an iteration.
|
|
Union of all subsets spans $Z$ ($Q_1 \cap \ldots \cap Q_r = Z$).
|
|
$q_j$ is the probability of picking block $Q_j$.</p>
|
|
<p>$q(z)$ is the probability of picking $z$ in current iteration; i.e. $q(z) = \sum_{j:z \in Q_j} q_j$ -
|
|
the sum of $q_j$ where $z \in Q_j$.</p>
|
|
<p>Then we get <strong>sampled counterfactual value</strong> fro block $j$,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{pink}{\tilde{v}(\sigma, I|j)} =
|
|
\sum_{z \in Q_j} \frac{1}{q(z)}
|
|
\pi^\sigma_{-i}(z[I]) \pi^\sigma(z[I], z) u_i(z)</script>
|
|
</p>
|
|
<p>The paper shows that</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\mathbb{E}_{j \sim q_j} \Big[ \color{pink}{\tilde{v}(\sigma, I|j)} \Big]
|
|
= \color{pink}{v_i(\sigma, I)}</script>
|
|
</p>
|
|
<p>with a simple proof.</p>
|
|
<p>Therefore we can sample a part of the game tree and calculate the regrets.
|
|
We calculate an estimate of regrets</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">
|
|
\color{coral}{\tilde{r}^t_i(I, a)} =
|
|
\color{pink}{\tilde{v}_i(\sigma^t |_{I \rightarrow a}, I)} - \color{pink}{\tilde{v}_i(\sigma^t, I)}
|
|
</script>
|
|
</p>
|
|
<p>And use that to update $\color{orange}{R^T_i(I, a)}$ and calculate
|
|
the strategy $\color{lightgreen}{\sigma_i^{T+1}(I)(a)}$ on each iteration.
|
|
Finally, we calculate the overall average strategy $\color{cyan}{\bar{\sigma}^T_i(I)(a)}$.</p>
|
|
<p>Here is a <a href="kuhn/index.html">Kuhn Poker</a> implementation to try CFR on Kuhn Poker.</p>
|
|
<p><em>Let’s dive into the code!</em></p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">320</span><span></span><span class="kn">from</span> <span class="nn">typing</span> <span class="kn">import</span> <span class="n">NewType</span><span class="p">,</span> <span class="n">Dict</span><span class="p">,</span> <span class="n">List</span><span class="p">,</span> <span class="n">Callable</span><span class="p">,</span> <span class="n">cast</span>
|
|
<span class="lineno">321</span>
|
|
<span class="lineno">322</span><span class="kn">from</span> <span class="nn">labml</span> <span class="kn">import</span> <span class="n">monit</span><span class="p">,</span> <span class="n">tracker</span><span class="p">,</span> <span class="n">logger</span><span class="p">,</span> <span class="n">experiment</span>
|
|
<span class="lineno">323</span><span class="kn">from</span> <span class="nn">labml.configs</span> <span class="kn">import</span> <span class="n">BaseConfigs</span><span class="p">,</span> <span class="n">option</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-1'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-1'>#</a>
|
|
</div>
|
|
<p>A player $i \in N$ where $N$ is the set of players</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">326</span><span class="n">Player</span> <span class="o">=</span> <span class="n">NewType</span><span class="p">(</span><span class="s1">'Player'</span><span class="p">,</span> <span class="nb">int</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-2'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-2'>#</a>
|
|
</div>
|
|
<p>Action $a$, $A(h) = {a: (h, a) \in H}$ where $h \in H$ is a non-terminal <a href="#History">history</a></p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">328</span><span class="n">Action</span> <span class="o">=</span> <span class="n">NewType</span><span class="p">(</span><span class="s1">'Action'</span><span class="p">,</span> <span class="nb">str</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-3'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-3'>#</a>
|
|
</div>
|
|
<p><a id="History"></a></p>
|
|
<h2>History</h2>
|
|
<p>History $h \in H$ is a sequence of actions including chance events,
|
|
and $H$ is the set of all histories.</p>
|
|
<p>This class should be extended with game specific logic.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">331</span><span class="k">class</span> <span class="nc">History</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-4'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-4'>#</a>
|
|
</div>
|
|
<p>Whether it’s a terminal history; i.e. game over.
|
|
$h \in Z$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">342</span> <span class="k">def</span> <span class="nf">is_terminal</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-5'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-5'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">347</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-6'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-6'>#</a>
|
|
</div>
|
|
<p><a id="terminal_utility"></a>
|
|
Utility of player $i$ for a terminal history.
|
|
$u_i(h)$ where $h \in Z$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">349</span> <span class="k">def</span> <span class="nf">terminal_utility</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">i</span><span class="p">:</span> <span class="n">Player</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-7'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-7'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">355</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-8'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-8'>#</a>
|
|
</div>
|
|
<p>Get current player, denoted by $P(h)$, where $P$ is known as <strong>Player function</strong>.</p>
|
|
<p>If $P(h) = c$ it means that current event is a chance $c$ event.
|
|
Something like dealing cards, or opening common cards in poker.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">357</span> <span class="k">def</span> <span class="nf">player</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">Player</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-9'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-9'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">364</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-10'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-10'>#</a>
|
|
</div>
|
|
<p>Whether the next step is a chance step; something like dealing a new card.
|
|
$P(h) = c$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">366</span> <span class="k">def</span> <span class="nf">is_chance</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">bool</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-11'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-11'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">371</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-12'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-12'>#</a>
|
|
</div>
|
|
<p>Sample a chance when $P(h) = c$.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">373</span> <span class="k">def</span> <span class="nf">sample_chance</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">Action</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-13'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-13'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">377</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-14'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-14'>#</a>
|
|
</div>
|
|
<p>Add an action to the history.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">379</span> <span class="k">def</span> <span class="fm">__add__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">action</span><span class="p">:</span> <span class="n">Action</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-15'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-15'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">383</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-16'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-16'>#</a>
|
|
</div>
|
|
<p>Get <a href="#InfoSet">information set</a> for the current player</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">385</span> <span class="k">def</span> <span class="nf">info_set_key</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="nb">str</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-17'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-17'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">389</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-18'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-18'>#</a>
|
|
</div>
|
|
<p>Create a new <a href="#InfoSet">information set</a> for the current player</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">391</span> <span class="k">def</span> <span class="nf">new_info_set</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="s1">'InfoSet'</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-19'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-19'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">395</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-20'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-20'>#</a>
|
|
</div>
|
|
<p>Human readable representation</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">397</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-21'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-21'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">401</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-22'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-22'>#</a>
|
|
</div>
|
|
<p><a id="InfoSet"></a></p>
|
|
<h2>Information Set $I_i$</h2>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">404</span><span class="k">class</span> <span class="nc">InfoSet</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-23'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-23'>#</a>
|
|
</div>
|
|
<p>Unique key identifying the information set</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">411</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-24'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-24'>#</a>
|
|
</div>
|
|
<p>$\sigma_i$, the <a href="#Strategy">strategy</a> of player $i$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">413</span> <span class="n">strategy</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Action</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-25'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-25'>#</a>
|
|
</div>
|
|
<p>Total regret of not taking each action $A(I_i)$,</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\color{coral}{\tilde{r}^t_i(I, a)} &=
|
|
\color{pink}{\tilde{v}_i(\sigma^t |_{I \rightarrow a}, I)} -
|
|
\color{pink}{\tilde{v}_i(\sigma^t, I)}
|
|
\\
|
|
\color{orange}{R^T_i(I, a)} &=
|
|
\frac{1}{T} \sum_{t=1}^T \color{coral}{\tilde{r}^t_i(I, a)}
|
|
\end{align}</script>
|
|
</p>
|
|
<p>We maintain $T \color{orange}{R^T_i(I, a)}$ instead of $\color{orange}{R^T_i(I, a)}$
|
|
since $\frac{1}{T}$ term cancels out anyway when computing strategy
|
|
$\color{lightgreen}{\sigma_i^{T+1}(I)(a)}$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">428</span> <span class="n">regret</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Action</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-26'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-26'>#</a>
|
|
</div>
|
|
<p>We maintain the cumulative strategy
|
|
<script type="math/tex; mode=display">\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}</script>
|
|
to compute overall average strategy</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{cyan}{\bar{\sigma}^T_i(I)(a)} =
|
|
\frac{\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}}{\sum_{t=1}^T \pi_i^{\sigma^t}(I)}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">435</span> <span class="n">cumulative_strategy</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="n">Action</span><span class="p">,</span> <span class="nb">float</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-27'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-27'>#</a>
|
|
</div>
|
|
<p>Initialize</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">437</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">key</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-28'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-28'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">441</span> <span class="bp">self</span><span class="o">.</span><span class="n">key</span> <span class="o">=</span> <span class="n">key</span>
|
|
<span class="lineno">442</span> <span class="bp">self</span><span class="o">.</span><span class="n">regret</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">actions</span><span class="p">()}</span>
|
|
<span class="lineno">443</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="mi">0</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">actions</span><span class="p">()}</span>
|
|
<span class="lineno">444</span> <span class="bp">self</span><span class="o">.</span><span class="n">calculate_strategy</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-29'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-29'>#</a>
|
|
</div>
|
|
<p>Actions $A(I_i)$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">446</span> <span class="k">def</span> <span class="nf">actions</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="o">-></span> <span class="n">List</span><span class="p">[</span><span class="n">Action</span><span class="p">]:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-30'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-30'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">450</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-31'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-31'>#</a>
|
|
</div>
|
|
<p>Load information set from a saved dictionary</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">452</span> <span class="nd">@staticmethod</span>
|
|
<span class="lineno">453</span> <span class="k">def</span> <span class="nf">from_dict</span><span class="p">(</span><span class="n">data</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">any</span><span class="p">])</span> <span class="o">-></span> <span class="s1">'InfoSet'</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-32'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-32'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">457</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-33'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-33'>#</a>
|
|
</div>
|
|
<p>Save the information set to a dictionary</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">459</span> <span class="k">def</span> <span class="nf">to_dict</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-34'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-34'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">463</span> <span class="k">return</span> <span class="p">{</span>
|
|
<span class="lineno">464</span> <span class="s1">'key'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">key</span><span class="p">,</span>
|
|
<span class="lineno">465</span> <span class="s1">'regret'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">regret</span><span class="p">,</span>
|
|
<span class="lineno">466</span> <span class="s1">'average_strategy'</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="p">,</span>
|
|
<span class="lineno">467</span> <span class="p">}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-35'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-35'>#</a>
|
|
</div>
|
|
<p>Load data from a saved dictionary</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">469</span> <span class="k">def</span> <span class="nf">load_dict</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">data</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="nb">any</span><span class="p">]):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-36'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-36'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">473</span> <span class="bp">self</span><span class="o">.</span><span class="n">regret</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'regret'</span><span class="p">]</span>
|
|
<span class="lineno">474</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'average_strategy'</span><span class="p">]</span>
|
|
<span class="lineno">475</span> <span class="bp">self</span><span class="o">.</span><span class="n">calculate_strategy</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-37'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-37'>#</a>
|
|
</div>
|
|
<h2>Calculate strategy</h2>
|
|
<p>Calculate current strategy using <a href="#RegretMatching">regret matching</a>.</p>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\color{lightgreen}{\sigma_i^{T+1}(I)(a)} =
|
|
\begin{cases}
|
|
\frac{\color{orange}{R^{T,+}_i(I, a)}}{\sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')}},
|
|
& \text{if} \sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')} \gt 0 \\
|
|
\frac{1}{\lvert A(I) \rvert},
|
|
& \text{otherwise}
|
|
\end{cases}
|
|
\end{align}</script>
|
|
</p>
|
|
<p>where $\color{orange}{R^{T,+}_i(I, a)} = \max \Big(\color{orange}{R^T_i(I, a)}, 0 \Big)$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">477</span> <span class="k">def</span> <span class="nf">calculate_strategy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-38'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-38'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{orange}{R^{T,+}_i(I, a)} = \max \Big(\color{orange}{R^T_i(I, a)}, 0 \Big)</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">496</span> <span class="n">regret</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="nb">max</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">regret</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-39'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-39'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">498</span> <span class="n">regret_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">regret</span><span class="o">.</span><span class="n">values</span><span class="p">())</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-40'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-40'>#</a>
|
|
</div>
|
|
<p>if $\sum_{a’\in A(I)}\color{orange}{R^{T,+}_i(I, a’)} \gt 0$,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">500</span> <span class="k">if</span> <span class="n">regret_sum</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-41'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-41'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{lightgreen}{\sigma_i^{T+1}(I)(a)} =
|
|
\frac{\color{orange}{R^{T,+}_i(I, a)}}{\sum_{a'\in A(I)}\color{orange}{R^{T,+}_i(I, a')}}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">503</span> <span class="bp">self</span><span class="o">.</span><span class="n">strategy</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="n">r</span> <span class="o">/</span> <span class="n">regret_sum</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">regret</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-42'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-42'>#</a>
|
|
</div>
|
|
<p>Otherwise,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">505</span> <span class="k">else</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-43'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-43'>#</a>
|
|
</div>
|
|
<p>$\lvert A(I) \rvert$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">507</span> <span class="n">count</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">a</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">regret</span><span class="p">))</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-44'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-44'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{lightgreen}{\sigma_i^{T+1}(I)(a)} =
|
|
\frac{1}{\lvert A(I) \rvert}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">510</span> <span class="bp">self</span><span class="o">.</span><span class="n">strategy</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">count</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">regret</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-45'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-45'>#</a>
|
|
</div>
|
|
<h2>Get average strategy</h2>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{cyan}{\bar{\sigma}^T_i(I)(a)} =
|
|
\frac{\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}}
|
|
{\sum_{t=1}^T \pi_i^{\sigma^t}(I)}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">512</span> <span class="k">def</span> <span class="nf">get_average_strategy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-46'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-46'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\sum_{t=1}^T \pi_i^{\sigma^t}(I) \color{lightgreen}{\sigma^t(I)(a)}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">521</span> <span class="n">cum_strategy</span> <span class="o">=</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="n">a</span><span class="p">,</span> <span class="mf">0.</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">actions</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-47'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-47'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\sum_{t=1}^T \pi_i^{\sigma^t}(I) =
|
|
\sum_{a \in A(I)} \sum_{t=1}^T
|
|
\pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">525</span> <span class="n">strategy_sum</span> <span class="o">=</span> <span class="nb">sum</span><span class="p">(</span><span class="n">cum_strategy</span><span class="o">.</span><span class="n">values</span><span class="p">())</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-48'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-48'>#</a>
|
|
</div>
|
|
<p>If $\sum_{t=1}^T \pi_i^{\sigma^t}(I) > 0$,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">527</span> <span class="k">if</span> <span class="n">strategy_sum</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-49'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-49'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{cyan}{\bar{\sigma}^T_i(I)(a)} =
|
|
\frac{\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}}
|
|
{\sum_{t=1}^T \pi_i^{\sigma^t}(I)}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">531</span> <span class="k">return</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="n">s</span> <span class="o">/</span> <span class="n">strategy_sum</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">s</span> <span class="ow">in</span> <span class="n">cum_strategy</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-50'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-50'>#</a>
|
|
</div>
|
|
<p>Otherwise,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">533</span> <span class="k">else</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-51'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-51'>#</a>
|
|
</div>
|
|
<p>$\lvert A(I) \rvert$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">535</span> <span class="n">count</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="nb">list</span><span class="p">(</span><span class="n">a</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">cum_strategy</span><span class="p">))</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-52'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-52'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\color{cyan}{\bar{\sigma}^T_i(I)(a)} =
|
|
\frac{1}{\lvert A(I) \rvert}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">538</span> <span class="k">return</span> <span class="p">{</span><span class="n">a</span><span class="p">:</span> <span class="mi">1</span> <span class="o">/</span> <span class="n">count</span> <span class="k">for</span> <span class="n">a</span><span class="p">,</span> <span class="n">r</span> <span class="ow">in</span> <span class="n">cum_strategy</span><span class="o">.</span><span class="n">items</span><span class="p">()}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-53'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-53'>#</a>
|
|
</div>
|
|
<p>Human readable representation</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">540</span> <span class="k">def</span> <span class="fm">__repr__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-54'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-54'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">544</span> <span class="k">raise</span> <span class="ne">NotImplementedError</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-55'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-55'>#</a>
|
|
</div>
|
|
<h2>Counterfactual Regret Minimization (CFR) Algorithm</h2>
|
|
<p>We do chance sampling (<strong>CS</strong>) where all the chance events (nodes) are sampled and
|
|
all other events (nodes) are explored.</p>
|
|
<p>We can ignore the term $q(z)$ since it’s the same for all terminal histories
|
|
since we are doing chance sampling and it cancels out when calculating
|
|
strategy (common in numerator and denominator).</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">547</span><span class="k">class</span> <span class="nc">CFR</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-56'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-56'>#</a>
|
|
</div>
|
|
<p>$\mathcal{I}$ set of all information sets.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">560</span> <span class="n">info_sets</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">InfoSet</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-57'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-57'>#</a>
|
|
</div>
|
|
<ul>
|
|
<li><code>create_new_history</code> creates a new empty history</li>
|
|
<li><code>epochs</code> is the number of iterations to train on $T$</li>
|
|
<li><code>n_players</code> is the number of players</li>
|
|
</ul>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">562</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span>
|
|
<span class="lineno">563</span> <span class="n">create_new_history</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[],</span> <span class="n">History</span><span class="p">],</span>
|
|
<span class="lineno">564</span> <span class="n">epochs</span><span class="p">:</span> <span class="nb">int</span><span class="p">,</span>
|
|
<span class="lineno">565</span> <span class="n">n_players</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">2</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-58'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-58'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">571</span> <span class="bp">self</span><span class="o">.</span><span class="n">n_players</span> <span class="o">=</span> <span class="n">n_players</span>
|
|
<span class="lineno">572</span> <span class="bp">self</span><span class="o">.</span><span class="n">epochs</span> <span class="o">=</span> <span class="n">epochs</span>
|
|
<span class="lineno">573</span> <span class="bp">self</span><span class="o">.</span><span class="n">create_new_history</span> <span class="o">=</span> <span class="n">create_new_history</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-59'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-59'>#</a>
|
|
</div>
|
|
<p>A dictionary for $\mathcal{I}$ set of all information sets</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">575</span> <span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span> <span class="o">=</span> <span class="p">{}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-60'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-60'>#</a>
|
|
</div>
|
|
<p>Tracker for analytics</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">577</span> <span class="bp">self</span><span class="o">.</span><span class="n">tracker</span> <span class="o">=</span> <span class="n">InfoSetTracker</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-61'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-61'>#</a>
|
|
</div>
|
|
<p>Returns the information set $I$ of the current player for a given history $h$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">579</span> <span class="k">def</span> <span class="nf">_get_info_set</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">h</span><span class="p">:</span> <span class="n">History</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-62'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-62'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">583</span> <span class="n">info_set_key</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">info_set_key</span><span class="p">()</span>
|
|
<span class="lineno">584</span> <span class="k">if</span> <span class="n">info_set_key</span> <span class="ow">not</span> <span class="ow">in</span> <span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span><span class="p">:</span>
|
|
<span class="lineno">585</span> <span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span><span class="p">[</span><span class="n">info_set_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">new_info_set</span><span class="p">()</span>
|
|
<span class="lineno">586</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span><span class="p">[</span><span class="n">info_set_key</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-63'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-63'>#</a>
|
|
</div>
|
|
<h3>Walk Tree</h3>
|
|
<p>This function walks the game tree.</p>
|
|
<ul>
|
|
<li><code>h</code> is the current history $h$</li>
|
|
<li><code>i</code> is the player $i$ that we are computing regrets of</li>
|
|
<li><a href="#HistoryProbability"><code>pi_i</code></a> is
|
|
$\pi^{\sigma^t}_i(h)$</li>
|
|
<li><a href="#HistoryProbability"><code>pi_neg_i</code></a> is
|
|
$\pi^{\sigma^t}_{-i}(h)$</li>
|
|
</ul>
|
|
<p>It returns the expected utility, for the history $h$
|
|
<script type="math/tex; mode=display">\sum_{z \in Z_h} \pi^\sigma(h, z) u_i(z)</script>
|
|
where $Z_h$ is the set of terminal histories with prefix $h$</p>
|
|
<p>While walking the tee it updates the total regrets $\color{orange}{R^T_i(I, a)}$.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">588</span> <span class="k">def</span> <span class="nf">walk_tree</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">h</span><span class="p">:</span> <span class="n">History</span><span class="p">,</span> <span class="n">i</span><span class="p">:</span> <span class="n">Player</span><span class="p">,</span> <span class="n">pi_i</span><span class="p">:</span> <span class="nb">float</span><span class="p">,</span> <span class="n">pi_neg_i</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-></span> <span class="nb">float</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-64'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-64'>#</a>
|
|
</div>
|
|
<p>If it’s a terminal history $h \in Z$ return the terminal utility $u_i(h)$.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">609</span> <span class="k">if</span> <span class="n">h</span><span class="o">.</span><span class="n">is_terminal</span><span class="p">():</span>
|
|
<span class="lineno">610</span> <span class="k">return</span> <span class="n">h</span><span class="o">.</span><span class="n">terminal_utility</span><span class="p">(</span><span class="n">i</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-65'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-65'>#</a>
|
|
</div>
|
|
<p>If it’s a chance event $P(h) = c$ sample a and go to next step.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">612</span> <span class="k">elif</span> <span class="n">h</span><span class="o">.</span><span class="n">is_chance</span><span class="p">():</span>
|
|
<span class="lineno">613</span> <span class="n">a</span> <span class="o">=</span> <span class="n">h</span><span class="o">.</span><span class="n">sample_chance</span><span class="p">()</span>
|
|
<span class="lineno">614</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">walk_tree</span><span class="p">(</span><span class="n">h</span> <span class="o">+</span> <span class="n">a</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">pi_i</span><span class="p">,</span> <span class="n">pi_neg_i</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-66'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-66'>#</a>
|
|
</div>
|
|
<p>Get current player’s information set for $h$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">617</span> <span class="n">I</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">_get_info_set</span><span class="p">(</span><span class="n">h</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-67'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-67'>#</a>
|
|
</div>
|
|
<p>To store $\sum_{z \in Z_h} \pi^\sigma(h, z) u_i(z)$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">619</span> <span class="n">v</span> <span class="o">=</span> <span class="mi">0</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-68'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-68'>#</a>
|
|
</div>
|
|
<p>To store
|
|
<script type="math/tex; mode=display">\sum_{z \in Z_h} \pi^{\sigma^t |_{I \rightarrow a}}(h, z) u_i(z)</script>
|
|
for each action $a \in A(h)$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">623</span> <span class="n">va</span> <span class="o">=</span> <span class="p">{}</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-69'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-69'>#</a>
|
|
</div>
|
|
<p>Iterate through all actions</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">626</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">I</span><span class="o">.</span><span class="n">actions</span><span class="p">():</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-70'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-70'>#</a>
|
|
</div>
|
|
<p>If the current player is $i$,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">628</span> <span class="k">if</span> <span class="n">i</span> <span class="o">==</span> <span class="n">h</span><span class="o">.</span><span class="n">player</span><span class="p">():</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-71'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-71'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\pi^{\sigma^t}_i(h + a) &= \pi^{\sigma^t}_i(h) \sigma^t_i(I)(a) \\
|
|
\pi^{\sigma^t}_{-i}(h + a) &= \pi^{\sigma^t}_{-i}(h)
|
|
\end{align}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">633</span> <span class="n">va</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">walk_tree</span><span class="p">(</span><span class="n">h</span> <span class="o">+</span> <span class="n">a</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">pi_i</span> <span class="o">*</span> <span class="n">I</span><span class="o">.</span><span class="n">strategy</span><span class="p">[</span><span class="n">a</span><span class="p">],</span> <span class="n">pi_neg_i</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-72'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-72'>#</a>
|
|
</div>
|
|
<p>Otherwise,</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">635</span> <span class="k">else</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-73'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-73'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\pi^{\sigma^t}_i(h + a) &= \pi^{\sigma^t}_i(h) \\
|
|
\pi^{\sigma^t}_{-i}(h + a) &= \pi^{\sigma^t}_{-i}(h) * \sigma^t_i(I)(a)
|
|
\end{align}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">640</span> <span class="n">va</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">walk_tree</span><span class="p">(</span><span class="n">h</span> <span class="o">+</span> <span class="n">a</span><span class="p">,</span> <span class="n">i</span><span class="p">,</span> <span class="n">pi_i</span><span class="p">,</span> <span class="n">pi_neg_i</span> <span class="o">*</span> <span class="n">I</span><span class="o">.</span><span class="n">strategy</span><span class="p">[</span><span class="n">a</span><span class="p">])</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-74'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-74'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\sum_{z \in Z_h} \pi^\sigma(h, z) u_i(z) =
|
|
\sum_{a \in A(I)} \Bigg[ \sigma^t_i(I)(a)
|
|
\sum_{z \in Z_h} \pi^{\sigma^t |_{I \rightarrow a}}(h, z) u_i(z)
|
|
\Bigg]</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">645</span> <span class="n">v</span> <span class="o">=</span> <span class="n">v</span> <span class="o">+</span> <span class="n">I</span><span class="o">.</span><span class="n">strategy</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">*</span> <span class="n">va</span><span class="p">[</span><span class="n">a</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-75'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-75'>#</a>
|
|
</div>
|
|
<p>If the current player is $i$,
|
|
update the cumulative strategies and total regrets</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">649</span> <span class="k">if</span> <span class="n">h</span><span class="o">.</span><span class="n">player</span><span class="p">()</span> <span class="o">==</span> <span class="n">i</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-76'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-76'>#</a>
|
|
</div>
|
|
<p>Update cumulative strategies
|
|
<script type="math/tex; mode=display">\sum_{t=1}^T \pi_i^{\sigma^t}(I)\color{lightgreen}{\sigma^t(I)(a)}
|
|
= \sum_{t=1}^T \Big[ \sum_{h \in I} \pi_i^{\sigma^t}(h)
|
|
\color{lightgreen}{\sigma^t(I)(a)} \Big]</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">654</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">I</span><span class="o">.</span><span class="n">actions</span><span class="p">():</span>
|
|
<span class="lineno">655</span> <span class="n">I</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">=</span> <span class="n">I</span><span class="o">.</span><span class="n">cumulative_strategy</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">+</span> <span class="n">pi_i</span> <span class="o">*</span> <span class="n">I</span><span class="o">.</span><span class="n">strategy</span><span class="p">[</span><span class="n">a</span><span class="p">]</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-77'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-77'>#</a>
|
|
</div>
|
|
<p>
|
|
<script type="math/tex; mode=display">\begin{align}
|
|
\color{coral}{\tilde{r}^t_i(I, a)} &=
|
|
\color{pink}{\tilde{v}_i(\sigma^t |_{I \rightarrow a}, I)} -
|
|
\color{pink}{\tilde{v}_i(\sigma^t, I)} \\
|
|
&=
|
|
\pi^{\sigma^t}_{-i} (h) \Big(
|
|
\sum_{z \in Z_h} \pi^{\sigma^t |_{I \rightarrow a}}(h, z) u_i(z) -
|
|
\sum_{z \in Z_h} \pi^\sigma(h, z) u_i(z)
|
|
\Big) \\
|
|
T \color{orange}{R^T_i(I, a)} &=
|
|
\sum_{t=1}^T \color{coral}{\tilde{r}^t_i(I, a)}
|
|
\end{align}</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">668</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">I</span><span class="o">.</span><span class="n">actions</span><span class="p">():</span>
|
|
<span class="lineno">669</span> <span class="n">I</span><span class="o">.</span><span class="n">regret</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">+=</span> <span class="n">pi_neg_i</span> <span class="o">*</span> <span class="p">(</span><span class="n">va</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">-</span> <span class="n">v</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-78'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-78'>#</a>
|
|
</div>
|
|
<p>Update the strategy $\color{lightgreen}{\sigma^t(I)(a)}$</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">672</span> <span class="n">I</span><span class="o">.</span><span class="n">calculate_strategy</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-79'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-79'>#</a>
|
|
</div>
|
|
<p>Return the expected utility for player $i$,
|
|
<script type="math/tex; mode=display">\sum_{z \in Z_h} \pi^\sigma(h, z) u_i(z)</script>
|
|
</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">676</span> <span class="k">return</span> <span class="n">v</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-80'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-80'>#</a>
|
|
</div>
|
|
<h3>Iteratively update $\color{lightgreen}{\sigma^t(I)(a)}$</h3>
|
|
<p>This updates the strategies for $T$ iterations.</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">678</span> <span class="k">def</span> <span class="nf">iterate</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-81'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-81'>#</a>
|
|
</div>
|
|
<p>Loop for <code>epochs</code> times</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">686</span> <span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">monit</span><span class="o">.</span><span class="n">iterate</span><span class="p">(</span><span class="s1">'Train'</span><span class="p">,</span> <span class="bp">self</span><span class="o">.</span><span class="n">epochs</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-82'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-82'>#</a>
|
|
</div>
|
|
<p>Walk tree and update regrets for each player</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">688</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">n_players</span><span class="p">):</span>
|
|
<span class="lineno">689</span> <span class="bp">self</span><span class="o">.</span><span class="n">walk_tree</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">create_new_history</span><span class="p">(),</span> <span class="n">cast</span><span class="p">(</span><span class="n">Player</span><span class="p">,</span> <span class="n">i</span><span class="p">),</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-83'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-83'>#</a>
|
|
</div>
|
|
<p>Track data for analytics</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">692</span> <span class="n">tracker</span><span class="o">.</span><span class="n">add_global_step</span><span class="p">()</span>
|
|
<span class="lineno">693</span> <span class="bp">self</span><span class="o">.</span><span class="n">tracker</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span><span class="p">)</span>
|
|
<span class="lineno">694</span> <span class="n">tracker</span><span class="o">.</span><span class="n">save</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-84'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-84'>#</a>
|
|
</div>
|
|
<p>Save checkpoints every $1,000$ iterations</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">697</span> <span class="k">if</span> <span class="p">(</span><span class="n">t</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">1_000</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
|
|
<span class="lineno">698</span> <span class="n">experiment</span><span class="o">.</span><span class="n">save_checkpoint</span><span class="p">()</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-85'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-85'>#</a>
|
|
</div>
|
|
<p>Print the information sets</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">701</span> <span class="n">logger</span><span class="o">.</span><span class="n">inspect</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">info_sets</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-86'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-86'>#</a>
|
|
</div>
|
|
<h3>Information set tracker</h3>
|
|
<p>This is a small helper class to track data from information sets</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">704</span><span class="k">class</span> <span class="nc">InfoSetTracker</span><span class="p">:</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-87'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-87'>#</a>
|
|
</div>
|
|
<p>Set tracking indicators</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">710</span> <span class="k">def</span> <span class="fm">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-88'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-88'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">714</span> <span class="n">tracker</span><span class="o">.</span><span class="n">set_histogram</span><span class="p">(</span><span class="sa">f</span><span class="s1">'strategy.*'</span><span class="p">)</span>
|
|
<span class="lineno">715</span> <span class="n">tracker</span><span class="o">.</span><span class="n">set_histogram</span><span class="p">(</span><span class="sa">f</span><span class="s1">'average_strategy.*'</span><span class="p">)</span>
|
|
<span class="lineno">716</span> <span class="n">tracker</span><span class="o">.</span><span class="n">set_histogram</span><span class="p">(</span><span class="sa">f</span><span class="s1">'regret.*'</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-89'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-89'>#</a>
|
|
</div>
|
|
<p>Track the data from all information sets</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">718</span> <span class="k">def</span> <span class="fm">__call__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">info_sets</span><span class="p">:</span> <span class="n">Dict</span><span class="p">[</span><span class="nb">str</span><span class="p">,</span> <span class="n">InfoSet</span><span class="p">]):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-90'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-90'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">722</span> <span class="k">for</span> <span class="n">I</span> <span class="ow">in</span> <span class="n">info_sets</span><span class="o">.</span><span class="n">values</span><span class="p">():</span>
|
|
<span class="lineno">723</span> <span class="n">avg_strategy</span> <span class="o">=</span> <span class="n">I</span><span class="o">.</span><span class="n">get_average_strategy</span><span class="p">()</span>
|
|
<span class="lineno">724</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">I</span><span class="o">.</span><span class="n">actions</span><span class="p">():</span>
|
|
<span class="lineno">725</span> <span class="n">tracker</span><span class="o">.</span><span class="n">add</span><span class="p">({</span>
|
|
<span class="lineno">726</span> <span class="sa">f</span><span class="s1">'strategy.</span><span class="si">{</span><span class="n">I</span><span class="o">.</span><span class="n">key</span><span class="si">}</span><span class="s1">.</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s1">'</span><span class="p">:</span> <span class="n">I</span><span class="o">.</span><span class="n">strategy</span><span class="p">[</span><span class="n">a</span><span class="p">],</span>
|
|
<span class="lineno">727</span> <span class="sa">f</span><span class="s1">'average_strategy.</span><span class="si">{</span><span class="n">I</span><span class="o">.</span><span class="n">key</span><span class="si">}</span><span class="s1">.</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s1">'</span><span class="p">:</span> <span class="n">avg_strategy</span><span class="p">[</span><span class="n">a</span><span class="p">],</span>
|
|
<span class="lineno">728</span> <span class="sa">f</span><span class="s1">'regret.</span><span class="si">{</span><span class="n">I</span><span class="o">.</span><span class="n">key</span><span class="si">}</span><span class="s1">.</span><span class="si">{</span><span class="n">a</span><span class="si">}</span><span class="s1">'</span><span class="p">:</span> <span class="n">I</span><span class="o">.</span><span class="n">regret</span><span class="p">[</span><span class="n">a</span><span class="p">],</span>
|
|
<span class="lineno">729</span> <span class="p">})</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-91'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-91'>#</a>
|
|
</div>
|
|
<h3>Configurable CFR module</h3>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">732</span><span class="k">class</span> <span class="nc">CFRConfigs</span><span class="p">(</span><span class="n">BaseConfigs</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-92'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-92'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">736</span> <span class="n">create_new_history</span><span class="p">:</span> <span class="n">Callable</span><span class="p">[[],</span> <span class="n">History</span><span class="p">]</span>
|
|
<span class="lineno">737</span> <span class="n">epochs</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1_00_000</span>
|
|
<span class="lineno">738</span> <span class="n">cfr</span><span class="p">:</span> <span class="n">CFR</span> <span class="o">=</span> <span class="s1">'simple_cfr'</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-93'>
|
|
<div class='docs doc-strings'>
|
|
<div class='section-link'>
|
|
<a href='#section-93'>#</a>
|
|
</div>
|
|
<p>Initialize <strong>CFR</strong> algorithm</p>
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">741</span><span class="nd">@option</span><span class="p">(</span><span class="n">CFRConfigs</span><span class="o">.</span><span class="n">cfr</span><span class="p">)</span>
|
|
<span class="lineno">742</span><span class="k">def</span> <span class="nf">simple_cfr</span><span class="p">(</span><span class="n">c</span><span class="p">:</span> <span class="n">CFRConfigs</span><span class="p">):</span></pre></div>
|
|
</div>
|
|
</div>
|
|
<div class='section' id='section-94'>
|
|
<div class='docs'>
|
|
<div class='section-link'>
|
|
<a href='#section-94'>#</a>
|
|
</div>
|
|
|
|
</div>
|
|
<div class='code'>
|
|
<div class="highlight"><pre><span class="lineno">746</span> <span class="k">return</span> <span class="n">CFR</span><span class="p">(</span><span class="n">create_new_history</span><span class="o">=</span><span class="n">c</span><span class="o">.</span><span class="n">create_new_history</span><span class="p">,</span>
|
|
<span class="lineno">747</span> <span class="n">epochs</span><span class="o">=</span><span class="n">c</span><span class="o">.</span><span class="n">epochs</span><span class="p">)</span></pre></div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
</div>
|
|
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.4/MathJax.js?config=TeX-AMS_HTML">
|
|
</script>
|
|
<!-- MathJax configuration -->
|
|
<script type="text/x-mathjax-config">
|
|
MathJax.Hub.Config({
|
|
tex2jax: {
|
|
inlineMath: [ ['$','$'] ],
|
|
displayMath: [ ['$$','$$'] ],
|
|
processEscapes: true,
|
|
processEnvironments: true
|
|
},
|
|
// Center justify equations in code and markdown cells. Elsewhere
|
|
// we use CSS to left justify single line equations in code cells.
|
|
displayAlign: 'center',
|
|
"HTML-CSS": { fonts: ["TeX"] }
|
|
});
|
|
</script>
|
|
<script>
|
|
function handleImages() {
|
|
var images = document.querySelectorAll('p>img')
|
|
|
|
console.log(images);
|
|
for (var i = 0; i < images.length; ++i) {
|
|
handleImage(images[i])
|
|
}
|
|
}
|
|
|
|
function handleImage(img) {
|
|
img.parentElement.style.textAlign = 'center'
|
|
|
|
var modal = document.createElement('div')
|
|
modal.id = 'modal'
|
|
|
|
var modalContent = document.createElement('div')
|
|
modal.appendChild(modalContent)
|
|
|
|
var modalImage = document.createElement('img')
|
|
modalContent.appendChild(modalImage)
|
|
|
|
var span = document.createElement('span')
|
|
span.classList.add('close')
|
|
span.textContent = 'x'
|
|
modal.appendChild(span)
|
|
|
|
img.onclick = function () {
|
|
console.log('clicked')
|
|
document.body.appendChild(modal)
|
|
modalImage.src = img.src
|
|
}
|
|
|
|
span.onclick = function () {
|
|
document.body.removeChild(modal)
|
|
}
|
|
}
|
|
|
|
handleImages()
|
|
</script>
|
|
</body>
|
|
</html> |