#

Counterfactual Regret Minimization (CFR) on Kuhn Poker

This applies Counterfactual Regret Minimization (CFR) to Kuhn poker.

Kuhn Poker is a two player 3-card betting game. The players are dealt one card each out of Ace, King and Queen (no suits). There are only three cards in the pack so one card is left out. Ace beats King and Queen and King beats Queen - just like in normal ranking of cards.

Both players ante $1$ chip (blindly bet $1$ chip). After looking at the cards, the first player can either pass or bet $1$ chip. If first player passes, the the player with higher card wins the pot. If first player bets, the second play can bet (i.e. call) $1$ chip or pass (i.e. fold). If the second player bets and the player with the higher card wins the pot. If the second player passes (i.e. folds) the first player gets the pot. This game is played repeatedly and a good strategy will optimize for the long term utility (or winnings).

Here's some example games:

KAp - Player 1 gets K. Player 2 gets A. Player 1 passes. Player 2 doesn't get a betting chance and Player 2 wins the pot of $2$ chips.
QKbp - Player 1 gets Q. Player 2 gets K. Player 1 bets a chip. Player 2 passes (folds). Player 1 gets the pot of $4$ because Player 2 folded.
QAbb - Player 1 gets Q. Player 2 gets A. Player 1 bets a chip. Player 2 also bets (calls). Player 2 wins the pot of $4$ .

He we extend the InfoSet class and History class defined in __init__.py with Kuhn Poker specifics.

37from typing import List, cast, Dict
38
39import numpy as np
40
41from labml import experiment
42from labml.configs import option
43from labml_nn.cfr import History as _History, InfoSet as _InfoSet, Action, Player, CFRConfigs

#

Kuhn poker actions are pass (p ) or bet (b )

46ACTIONS = cast(List[Action], ['p', 'b'])

#

The three cards in play are Ace, King and Queen

48CHANCES = cast(List[Action], ['A', 'K', 'Q'])

#

There are two players

50PLAYERS = cast(List[Player], [0, 1])

#

Information set

53class InfoSet(_InfoSet):

#

Does not support save/load

58    @staticmethod
59    def from_dict(data: Dict[str, any]) -> 'InfoSet':

#

61        pass

#

Return the list of actions. Terminal states are handled by History class.

63    def actions(self) -> List[Action]:

#

67        return ACTIONS

#

Human readable string representation - it gives the betting probability

69    def __repr__(self):

#

73        total = sum(self.cumulative_strategy.values())
74        total = max(total, 1e-6)
75        bet = self.cumulative_strategy[cast(Action, 'b')] / total
76        return f'{bet * 100: .1f}%'

#

History

This defines when a game ends, calculates the utility and sample chance events (dealing cards).

The history is stored in a string:

First two characters are the cards dealt to player 1 and player 2
The third character is the action by the first player
Fourth character is the action by the second player

79class History(_History):

#

History

93    history: str

#

Initialize with a given history string

95    def __init__(self, history: str = ''):

#

99        self.history = history

#

Whether the history is terminal (game over).

101    def is_terminal(self):

#

Players are yet to take actions

106        if len(self.history) <= 2:
107            return False

#

Last player to play passed (game over)

109        elif self.history[-1] == 'p':
110            return True

#

Both players called (bet) (game over)

112        elif self.history[-2:] == 'bb':
113            return True

#

Any other combination

115        else:
116            return False

#

Calculate the terminal utility for player $1$ , $u_{1} (z)$

118    def _terminal_utility_p1(self) -> float:

#

$+ 1$ if Player 1 has a better card and $- 1$ otherwise

123        winner = -1 + 2 * (self.history[0] < self.history[1])

#

Second player passed

126        if self.history[-2:] == 'bp':
127            return 1

#

Both players called, the player with better card wins $2$ chips

129        elif self.history[-2:] == 'bb':
130            return winner * 2

#

First player passed, the player with better card wins $1$ chip

132        elif self.history[-1] == 'p':
133            return winner

#

History is non-terminal

135        else:
136            raise RuntimeError()

#

Get the terminal utility for player $i$

138    def terminal_utility(self, i: Player) -> float:

#

If $i$ is Player 1

143        if i == PLAYERS[0]:
144            return self._terminal_utility_p1()

#

Otherwise, $u_{2} (z) = - u_{1} (z)$

146        else:
147            return -1 * self._terminal_utility_p1()

#

The first two events are card dealing; i.e. chance events

149    def is_chance(self) -> bool:

#

153        return len(self.history) < 2

#

Add an action to the history and return a new history

155    def __add__(self, other: Action):

#

159        return History(self.history + other)

#

Current player

161    def player(self) -> Player:

#

165        return cast(Player, len(self.history) % 2)

#

Sample a chance action

167    def sample_chance(self) -> Action:

#

171        while True:

#

Randomly pick a card

173            r = np.random.randint(len(CHANCES))
174            chance = CHANCES[r]

#

See if the card was dealt before

176            for c in self.history:
177                if c == chance:
178                    chance = None
179                    break

#

Return the card if it was not dealt before

182            if chance is not None:
183                return cast(Action, chance)

#

Human readable representation

185    def __repr__(self):

#

189        return repr(self.history)

#

Information set key for the current history. This is a string of actions only visible to the current player.

191    def info_set_key(self) -> str:

#

Get current player

197        i = self.player()

#

Current player sees her card and the betting actions

199        return self.history[i] + self.history[2:]

#

201    def new_info_set(self) -> InfoSet:

#

Create a new information set object

203        return InfoSet(self.info_set_key())

#

A function to create an empty history object

206def create_new_history():

#

208    return History()

#

Configurations extends the CFR configurations class

211class Configs(CFRConfigs):

#

215    pass

#

Set the create_new_history method for Kuhn Poker

218@option(Configs.create_new_history)
219def _cnh():

#

223    return create_new_history

#

Run the experiment

226def main():

#

Create an experiment, we only write tracking information to sqlite to speed things up. Since the algorithm iterates fast and we track data on each iteration, writing to other destinations such as Tensorboard can be relatively time consuming. SQLite is enough for our analytics.

235    experiment.create(name='kuhn_poker', writers={'sqlite'})

#

Initialize configuration

237    conf = Configs()

#

Load configuration

239    experiment.configs(conf)

#

Start the experiment

241    with experiment.start():

#

Start iterating

243        conf.cfr.iterate()

#

247if __name__ == '__main__':
248    main()