Participants: 2014 - Heads-up No-limit Texas Hold'em

Heads-up No-limit Texas Hold'em

Feste

  • Team Name: Feste
  • Team Members: Francois Pays
  • Affiliation: Independent
  • Location: Paris, France
  • Non-dynamic Agent
  • Technique: The card abstraction uses respectively 169, 400, 50 and 25 buckets for preflop, flop, turn and river. The buckets are computed using k-means clustering over selected hand parameters such as expected values, standard deviation and skewness at last round. The betting abstraction is quite coarse: half pot (only at first bet), pot, quad-pot and all-in.

    The abstraction is represented using sequence form with the imperfect recall extension. The resulting abstraction has only 300 million game states. It is solved using a custom interior point solver with indirect algebra [1]. The solver runs on a mid-range workstation and is GPU-accelerated with CUDA. The solver has been tested up to 3 billion game states and can therefore handle abstractions ten times larger, but interestingly, either finer card or betting abstractions did not result in stronger no-limit players.

    Since Feste is not yet able to gather accurate enough information from its opponent in 3000-hand games, there is no dynamic adaptation. The instant runoff player follows a defensive strategy and the total-bankroll player, a slightly more aggressive one.


    [1] Francois Pays. 2014. An Interior Point Approach to Large Games of Incomplete Information. Proceedings of the AAAI-2014 Workshop on Computer Poker.

HibiscusBiscuit

  • Team Name: Cleverpiggy
  • Team Members: Allen Cunningham
  • Affiliation: Independent
  • Location: Marinal del Rey, CA, US
  • Non-dynamic Agent
  • Technique: Hibiscus Biscuit is comprised of separately trained big blind and button strategies with knowledge of different bet sizes.  In each case the hero uses only a couple sizes but defends against many.  Both sides use a card abstraction of 169, 20000, 18000 and 17577 buckets.  These consist of board card distinctions, earth mover clustering for the flop and turn, and clustering over (wins, ties) for the river.


    Regret Minimization in Games with Incomplete Information 2007
    Evaluating State-Space Abstractions in Extensive-Form Games 2013

Hyperborean (instant run-off)

  • Team Name: Univeristy of Alberta
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Non-dynamic Agent
  • Technique: Hyperborean2014-NL-IRO is a Nash equilibrium approximation trained using PureCFR [1, Section 5.5],
    a recent CFR variant developed by Oskari Tammelin.  Its card abstraction is symmetric
    and uses imperfect recall, with 169 (perfect) preflop buckets, 18630 flop buckets,
    3700 turn buckets and 1175 river buckets, using the k-means Earthmover and k-means OCHS
    buckets recently presented by Johanson et al [2].  The betting abstraction is asymmetric,
    and has different bet sizes and limits for the opponent and the agent.  The opponent is anticipated
    to have a large number of bets, including min-bets or tenth-pot bets, with higher limits
    on how many bets of each type can be made in each round than the agent.  The agent has a similar set of bets,
    including 0.1-pot and 0.25-pot but not including min-bets, with bet sizes 1.5-pot and less being
    restricted to its first two actions.  This asymmetric betting abstraction gives the agent the ability to
    interpret many actions of the opponent in order to limit the impact of translation errors, while still having
    a few unusual bet sizes (0.1, 0.25, 0.65) that may cause translation errors in the opponents.

    Since this agent is asymmetric, computing its strategy required solving two abstract games.
    The game with the opponent in seat 1 had 9,765,306,248 information sets and 26,879,972,986 infoset-actions,
    and the game with the opponent in seat 2 had 10,986,105,934 information sets and 30,285,810,764 infoset-actions.
    While the average strategy is the component of CFR that converges to an equilibrium, for
    this set of strategies we only ran 80 billion and 84 billion iterations of PureCFR respectively, and
    for this size of game we anticipated improvement up to 300 billion or more iterations.  Instead of
    the average strategy, this agent uses the current strategy which does not converge to equilibrium,
    but has been demonstrated by Gibson to improve much more quickly in in-game performance [1, Section 4.4.3].


    [1] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.
    [2] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

Hyperborean (total bankroll)

  • Team Name: Univeristy of Alberta
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Dynamic Agent
  • Technique: Hyperborean2014-2pn-TBR is an implicit modelling agent [2] consisting of three abstract strategies. All strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [10] with imperfect recall abstractions [9]. We also abstract the raise action to a number of bets relative to the pot size. Both strategies makes raises equal to 0.5, 0.75, 1, 1.5, 3, 6, 11, 20, or 40 times the pot size, or go all-in. The portfolio of strategies for the agent consists of:

    1) A Nash equilibrium approximation

    To create our abstract game for the strategy, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which one of our preliminary 2-player nolimit programs was faced with a decision at that betting sequence in self-play, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 3700, 3700, and 3700 buckets per betting round respectively, while the important part used 169, 180,000, 1,530,000, and 1,680,000 buckets per betting round respectively. Buckets were calculated according to public card textures and the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [8].  By forgetting previous card information and rebucketing on every round [9], this yields an imperfect recall abstract game. The strategy profile of this abstract game was computed from approximately 498 billion iterations of PureCFR [5, Section 5.5], a recent CFR variant developed by Oskari Tammelin.  This type of strategy is also known as a "dynamic expert strategy" [6].

    2) A data biased response to aggregate data of 2011 and 2012 ACPC competitors

    One exploitive response in the portfolio was created using data biased robust counter strategies [7] to aggregate data from all of the agents in the 2011 and 2012 heads-up no-limit ACPC events. It uses the same betting abstraction as the above Nash equilibrium approximation, but the card abstraction consists of 169, 9000, 9000, and 3700 k-means Earthmover and k-means OCHS buckets per betting round uniformly across the game tree.  An asymmetric abstraction is used for the frequentist model used by the data biased response [3].  The model's abstraction ignores card information and only models agents on their abstract betting.

    3) A data biased response to aggregate data of some 2013 ACPC competitors

    The second exploitive response uses the same abstract game as the previous DBR, but only uses data from agents that weren't beaten by the 2013 Hyperborean TBR entry for at least 750 mbb/g.

    A mixture of these agents is dynamically generated using a slightly modified Exp4-like algorithm [1] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [4].


    [1] P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

    [2] Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. Online Implicit Agent Modelling. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

    [3] Nolan Bard, Michael Johanson, Michael Bowling.  Asymmetric Abstractions for Adversarial Settings.  In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

    [4] Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. Strategy Evaluation in Extensive Games with Importance Sampling. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.

    [5] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.

    [6] Richard Gibson and Duane Szafron.  On Strategy Stitching in Large Extensive Form Multiplayer Games.  In Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS), 2011.

    [7] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

    [8] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

    [9] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

    [10] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

ArizonaStu (KEmpfer)

  • Team Name: KEmpfer
  • Team Members: Eneldo Loza Mencia, Julian Prommer
  • Affiliation: Knowledge Engineering Group - Technische Universität Darmstadt
  • Location: Darmstadt, Germany
  • Non-dynamic Agent
  • Technique: The agent implements a list of expert rules and follows these. Additional opponent statistics can be collected and be used in the rules, but we currently do not make use of this option. The backup strategy if no expert rule is found is to play according to the all-in equity and pot-odds.

Little Rock

  • Team Name: Little Rock
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Goonellabah, NSW, Australia
  • Non-dynamic Agent
  • Technique: External sampling MCCFR approach, virtually the same as last year but with a few enhancements to the card abstraction and action abstraction techniques.


    Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078-1086, 2009.

Lucifer

  • Team Name: PokerCPT
  • Team Members: Luis Filipe Teofilo
  • Affiliation: University of Porto, Artificial Intelligence and Computer Science Laboratory
  • Location: Porto, Portugal
  • Dynamic Agent
  • Technique: The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

Nyx

  • Team Name: Nyx
  • Team Members: Martin Schmid
  • Affiliation: Charles University
  • Location: Prague, Czech Republicx
  • Non-dynamic Agent
  • Technique: Improved version of the previous Nyx. Better action abstraction and new automatic public card abstraction

PijaiBot

  • Team Name: PijaiBot
  • Team Members: Ryan Pijai
  • Affiliation: Independent
  • Location:  Orlando, FL, USA
  • Non-dynamic Agent
  • Technique: PijaiBot is an Imperfect Recall, Approximate Nash Equilibrium agent with novel approaches to card-clustering, bet-translating, and opponent-trapping.  All non-isomorphically-similar card situations are grouped together using K-Means Clustering with Bhattacharyya Distance of Expected Hand-Strengths as the distance measure rather than more traditional distance measures.  When opponent bet sizes do not match any of the sizes in PijaiBot's betting abstraction, PijaiBot interprets bet sizes using Soft Translation of Geometric Similarity based on pot-odds, rather than on pot-relative or stack-relative bet sizes as has been done in the past.

    PijaiBot has special-case logic for translating small bets that do not match any of its abstraction bet sizes into checks and calls, and is able to patch up its internal abstract betting history as needed to make those translations valid action sequences.  PijaiBot attempts to exploit other agents that do not handle this and other types of similarly tricky situations properly by occasionally overriding PijaiBot's own Nash Equilibrium strategy suggestions with potentially more damaging actions that test and confuse its opponents by inducing them into misrepresenting the true state of the game.

Prelude

  • Team Name: Prelude
  • Team Members: Tim Reiff
  • Affiliation: Unfold Poker
  • Location:  Las Vegas, NV, USA
  • Non-dynamic Agent
  • Technique: Prelude is an equilibrium strategy that implements several published techniques, including the training algorithm Pure CFR, the opponent bet translation method, and a card abstraction based on k-means clustering over hand strength distributions.  I had hoped to test an agent with some ambitious importance sampling, but that one is converging slowly...  Thus, I whipped up a more conservative entry, with a streamlined betting tree partly designed for faster training.  I snuck in a few minor experiments still, including modified EV histograms and OCHS categories for the card abstraction, selective use of purification thresholding, and some speculative bet sizing adjustments.


    1. Sam Ganzfried and Tuomas Sandholm. "Tartanian5: A Heads-Up No-Limit Texas Hold'em Poker-Playing Program". In Computer Poker Symposium at the National Conference on Artificial Intelligence (AAAI), 2012.
    2. Richard Gibson. "Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents". PhD thesis, University of Alberta, 2014.
    3. Greg Hamerly. "Making k-means even faster".  In proceedings of the 2010 SIAM international conference on data mining (SDM 2010), April 2010.
    4. Eric Jackson. "Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing". In Computer Poker Workshop on Artificial Intelligence (AAAI), 2013.
    5. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. "Evaluating State-Space Abstractions in Extensive-Form Games". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.
    6. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret Minimization in Games with Incomplete Information". In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2007.

SartreNLExp

  • Team Name: Sartre
  • Team Members: Kevin Norris, Jonathan Rubin, Ian Watson
  • Affiliation: The University of Auckland
  • Location: Auckland, New Zealand
  • Dynamic Agent
  • Technique: SartreNLExp combines an approximate Nash equilibrium strategy with exploitation capabilities. It plays a base approximate Nash equilibrium strategy that was created by imitating the play [1] of the 2013 Slumbot agent [2]. SartreNLExp also incorporates a statistical exploitation module that models the opponent online, and identifies exploitable statistical anomalies in the opponent play. When a game state arises where the statistical exploitation module is able to exploit one of the opponents statistical anomalies it overrides the base strategy and provides an exploitive action. Together the base strategy and statistical exploitation module provide safe opponent exploitation, given that the opponent model is an accurate reflection of the opponent’s action frequencies. The agent has been improved from its previous iteration, presented in [3]. The exploitation capabilities of the statistical exploitation module have been greatly improved and the opponent model has been entirely overhauled. Additionally a novel decaying history method, statistic specific decaying history, has been implemented to ensure the opponent model is able to accurately reflect the frequency statistics of both static and dynamic opponents.

    References to relevant papers, if any
    1. Rubin, J., & Watson, I. (2011). Successful performance via decision generalisation in no limit Texas Hold’em. In Case-Based Reasoning Research and Development (pp. 467-481). Springer Berlin Heidelberg.
    2. Jackson, E. (2013, June). Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing. In Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence.
    3. Norris, K., & Watson, I. (2013, August). A statistical exploitation module for Texas Hold'em: And it's benefits when used with an approximate nash equilibrium strategy. In Computational Intelligence in Games (CIG), 2013 IEEE Conference on (pp. 1-8). IEEE.

Slumbot

  • Team Name: Slumbot
  • Team Members: Eric Jackson
  • Affiliation: Independent
  • Non-dynamic Agent
  • Technique: I use Pure External CFR to compute an approximate equilibrium.  A decomposition technique described in my forthcoming workshop paper allows me to break the
    game tree into pieces that can be solved independently.  I employ an abstraction that uses more granularity (both more bet sizes and more buckets) at more commonly reached game states.


    "A Time and Space Efficient Algorithm for Approximately Solving Large Imperfect Information Games"; Eric Jackson; 2014; forthcoming in the Proceedings of the Workshop on Computer Poker and Imperfect Information at AAAI-14.

Tartanian7

  • Team Name: Tartanian
  • Team Members: Noam Brown, Sam Ganzfried, Tuomas Sandholm
  • Affiliation: Carnegie Mellon University
  • Location: Pittsburgh, PA, USA
  • Non-dynamic Agent
  • Technique: One/Two Paragraph Summary of Technique:

    Tartanian7 plays an approximate Nash equilibrium strategy that was computed on Pittsburgh's shared-memory supercomputer, which has a cache coherent Non-Uniform Memory Access (ccNUMA) architecture. We developed a new abstraction algorithm and a new equilibrium-finding algorithm that enabled us to perform a massive equilibrium computation on this architecture.

    The abstraction algorithm first clusters public flop boards, assigning each cluster to a blade on the supercomputer. These public clusters are computed by clustering using a distance function based on how often our abstraction from last year grouped hands together on the flop with different sets of public cards. Within each cluster, the algorithm then buckets the flop, turn, and river hands that are possible given one of the public flops in the cluster, using an imperfect-recall abstraction algorithm. We did not perform any abstraction for the preflop round.

    Our equilibrium-finding algorithm is a modified version of external-sampling MCCFR. It samples one pair of preflop hands per iteration. For the postflop, each blade samples community cards from its public cluster and performs MCCFR in parallel. The samples are weighted to remove bias.

    Our agent also uses a novel reverse mapping technique that compensates for the failure of CFR to fully converge and to possibly overfit the strategies to the abstraction.