This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.
Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.
Both players ante $1$ chip (blindly bet $1$ chip). After looking at the cards, the first player can either pass or bet $1$ chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) $1$ chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).
Here’s some example games:
KAp - Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn’t get a betting chance and Player 2 wins the pot of $2$ chips.QKbp - Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of $4$ because Player 2 folded.QAbb - Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of $4$.He we extend the InfoSet class and History class defined in __init__.py
with Kuhn Poker specifics.
37from typing import List, cast, Dict
38
39import numpy as np
40
41from labml import experiment
42from labml.configs import option
43from labml_nn.cfr import History as _History, InfoSet as _InfoSet, Action, Player, CFRConfigs
44from labml_nn.cfr.infoset_saver import InfoSetSaverKuhn poker actions are pass (p) or bet (b)
47ACTIONS = cast(List[Action], ['p', 'b'])The three cards in play are Ace, King and Queen
49CHANCES = cast(List[Action], ['A', 'K', 'Q'])There are two players
51PLAYERS = cast(List[Player], [0, 1])54class InfoSet(_InfoSet):Does not support save/load
59 @staticmethod
60 def from_dict(data: Dict[str, any]) -> 'InfoSet':62 passReturn the list of actions. Terminal states are handled by History class.
64 def actions(self) -> List[Action]:68 return ACTIONSHuman readable string representation - it gives the betting probability
70 def __repr__(self):74 total = sum(self.cumulative_strategy.values())
75 total = max(total, 1e-6)
76 bet = self.cumulative_strategy[cast(Action, 'b')] / total
77 return f'{bet * 100: .1f}%'This defines when a game ends, calculates the utility and sample chance events (dealing cards).
The history is stored in a string: * First two characters are the cards dealt to player 1 and player 2 * The third character is the action by the first player * Fourth character is the action by the second player
80class History(_History):History
93 history: strInitialize with a given history string
95 def __init__(self, history: str = ''):99 self.history = historyWhether the history is terminal (game over).
101 def is_terminal(self):Players are yet to take actions
106 if len(self.history) <= 2:
107 return FalseLast player to play passed (game over)
109 elif self.history[-1] == 'p':
110 return TrueBoth players called (bet) (game over)
112 elif self.history[-2:] == 'bb':
113 return TrueAny other combination
115 else:
116 return FalseCalculate the terminal utility for player $1$, $u_1(z)$
118 def _terminal_utility_p1(self) -> float:$+1$ if Player 1 has a better card and $-1$ otherwise
123 winner = -1 + 2 * (self.history[0] < self.history[1])Second player passed
126 if self.history[-2:] == 'bp':
127 return 1Both players called, the player with better card wins $2$ chips
129 elif self.history[-2:] == 'bb':
130 return winner * 2First player passed, the player with better card wins $1$ chip
132 elif self.history[-1] == 'p':
133 return winnerHistory is non-terminal
135 else:
136 raise RuntimeError()Get the terminal utility for player $i$
138 def terminal_utility(self, i: Player) -> float:If $i$ is Player 1
143 if i == PLAYERS[0]:
144 return self._terminal_utility_p1()Otherwise, $u_2(z) = -u_1(z)$
146 else:
147 return -1 * self._terminal_utility_p1()The first two events are card dealing; i.e. chance events
149 def is_chance(self) -> bool:153 return len(self.history) < 2Add an action to the history and return a new history
155 def __add__(self, other: Action):159 return History(self.history + other)Current player
161 def player(self) -> Player:165 return cast(Player, len(self.history) % 2)Sample a chance action
167 def sample_chance(self) -> Action:171 while True:Randomly pick a card
173 r = np.random.randint(len(CHANCES))
174 chance = CHANCES[r]See if the card was dealt before
176 for c in self.history:
177 if c == chance:
178 chance = None
179 breakReturn the card if it was not dealt before
182 if chance is not None:
183 return cast(Action, chance)Human readable representation
185 def __repr__(self):189 return repr(self.history)Information set key for the current history. This is a string of actions only visible to the current player.
191 def info_set_key(self) -> str:Get current player
197 i = self.player()Current player sees her card and the betting actions
199 return self.history[i] + self.history[2:]201 def new_info_set(self) -> InfoSet:Create a new information set object
203 return InfoSet(self.info_set_key())A function to create an empty history object
206def create_new_history():208 return History()Configurations extends the CFR configurations class
211class Configs(CFRConfigs):215 passSet the create_new_history method for Kuhn Poker
218@option(Configs.create_new_history)
219def _cnh():223 return create_new_history226def main():Create an experiment, we only write tracking information to sqlite to speed things up.
Since the algorithm iterates fast and we track data on each iteration, writing to
other destinations such as Tensorboard can be relatively time consuming.
SQLite is enough for our analytics.
235 experiment.create(name='kuhn_poker', writers={'sqlite', 'screen'})Initialize configuration
237 conf = Configs()Load configuration
239 experiment.configs(conf)Set models for saving
241 experiment.add_model_savers({'info_sets': InfoSetSaver(conf.cfr.info_sets)})Start the experiment
243 with experiment.start():Start iterating
245 conf.cfr.iterate()249if __name__ == '__main__':
250 main()