Participants: 2014 - Heads-up Limit Texas Hold'em

Heads-up Limit Texas Hold'em

Cleverpiggy

  • Team Name: Cleverpiggy
  • Team Members: Allen Cunningham
  • Affiliation: Independent
  • Location: Marinal del Rey, CA, US
  • Non-dynamic Agent
  • Technique: Cleverpiggy is the progeny of 20 billion iterations of chance sampled CFR run on a quad core intel with 48gigs of ram.  She uses a card abstraction with 169, 567528, 60000, 180000 hand types for each respective street.  Flop hands are essentially unabstracted.  For the turn and river, board types are established by dividing all flops into 20 categories, each of which branches into 3 turns which branch into three rivers, resulting in 60 turn and 180 river distinctions.  Hands for each board type are then divided into 1000 buckets based on earth mover and OCHS clustering.


    Regret Minimization in Games with Incomplete Information 2007
    Evaluating State-Space Abstractions in Extensive-Form Games 2013

Escabeche

  • Team Members: Marv Andersen
  • Affiliation: Independent
  • Location: London, UK
  • Non-dynamic Agent
  • Technique: This bot is a neural net trained to imitate the play of previous ACPC winners.

Feste

  • Team Name: Feste
  • Team Members: Francois Pays
  • Affiliation: Independent
  • Location: Paris, France
  • Dynamic Agent
  • Technique: The card abstraction uses respectively 169, 1500, 400 and 200 buckets for preflop, flop, turn and river. The buckets are computed using k-means clustering over selected hand parameters such as expected values, standard deviation and skewness at last round.

    The resulting abstraction is represented using sequence form with the imperfect recall extension and has 1.3 billion game states. It is solved using a custom interior point solver with indirect algebra [1]. The solver runs on a mid-range workstation and is GPU-accelerated with CUDA.

    Feste has two strategies at its disposal: a defensive one, close to the equilibrium but still slightly offensive, and a second strategy, definitely aggressive and consequently off the equilibrium. During the course of the game, Feste uses Thompson sampling to select the best adapted strategy for the opponent.


    [1] Francois Pays. 2014. An Interior Point Approach to Large Games of Incomplete Information. Proceedings of the AAAI-2014 Workshop on Computer Poker.

Hyperborean

  • Team Name: Univeristy of Alberta
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Dynamic Agent
  • Technique: Hyperborean2014-2pl is an implicit modelling agent [2] consisting of four abstract strategies. All strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [8] with imperfect recall card abstractions.  Buckets were calculated according to public card textures and the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [6].  By forgetting previous card information and rebucketing on every round [7], this yields an imperfect recall abstract game.

    The portfolio of strategies for the agent consists of:

    1) An asymmetric equliibrium strategy

    An asymmetric equilibrium strategy was generated to exploit mistakes that can be made by equilibrium based agents using smaller abstractions of the game [3]. The abstraction for the final strategy uses 169, 1,348,620,  1,521,978, and 840,000 buckets on each round, respectively.  During training with CFR, the opponent uses a smaller abstraction with 169 buckets on the pre-flop, and 9,000 buckets on each subsequent round.

    2) Three data biased robust counter strategies based on prior ACPC competitors

    Three strategies in the portfolio are designed to exploit specific players from prior ACPC events.  Each response was created using data biased robust counter strategies [5] to data from a particular competitor in prior ACPC events.  An asymmetric abstraction is used for the frequentist model used by the data biased response [3], placing observations of the player in a smaller abstraction than the regret minimizing portions of the strategy.

    A mixture of these strategies is dynamically generated using a slightly modified Exp4-like algorithm [1] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [4].


    [1] P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

    [2] Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. Online Implicit Agent Modelling. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

    [3] Nolan Bard, Michael Johanson, Michael Bowling.  Asymmetric Abstractions for Adversarial Settings.  In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

    [4] Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. Strategy Evaluation in Extensive Games with Importance Sampling. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.

    [5] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

    [6] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

    [7] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

    [8] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

Lucifer

  • Team Name: PokerCPT
  • Team Members: Luis Filipe Teofilo
  • Affiliation: University of Porto, Artificial Intelligence and Computer Science Laboratory
  • Location: Porto, Portugal
  • Dynamic Agent
  • Technique: The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

PokerStar

  • Team Name: PokerStar
  • Team Members: Ingo Kaps
  • Location: Frankfurt, Hessen, Germany
  • Dynamic Agent
  • Technique: The PokerStar Bot is written in Pascal.

    The PokerStar Bot plays at Preflop: 2.5% fold, 95% call, 2,5% raise,
    so other bots could not easy modelling.

    After Preflop PokerStar Bot calculates a opponent based squered weighted hand strength
    to use an optimized static bucket CFR Table.
    If the opponent RaiseRatio is to low a rule based Strategy will be used.
    If the opponent RaiseRatio is very low additional a Selby Preflop Strategy will be used.
    If the opponent RaiseRatio is to high Pokerstar Bot will ever call when the opponent is
    raising, or will raise if the opponent checks.

ProPokerTools

  • Team Name: ProPokerTools
  • Team Members: Dan Hutchings
  • Affiliation: ProPokerTools
  • Location: Lakewood, Colorado, US
  • Non-dynamic Agent
  • Technique: This hulhe agent was created using established methods; regret minimization,
    partial recall, etc. etc.

    Last year, I gave myself a constraint in building my AI agents; all agents were
    created on a single machine that costs less than $1,000. This year, I have loosened that constraint to allow $1,000 worth of rented compute time in the 'cloud'.

    This year's entry has been improved in three different areas; size (9 times larger), build time (9 times longer), and abstraction quality. Tests using the 2013 benchmark server show this agent would likely have placed third or fourth in the instant run-off competition if it were entered last year. Additional improvements have been developed but were not ready in time for this year's competition.

Slugathorus

  • Team Name: Slugathorus
  • Team Members: Daniel Berger
  • Affiliation: University of South Wales
  • Dynamic Agent
  • Technique: The agent combines a precomputed approximate Nash equilibrum strategy generated by public chance sampled MCCFR with a new modelling technique designed to identify when the opponent is making mistakes and exploit them.

    "Modelling Player Weakness in Poker". (Berger, 2013)
    "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization". (Johanson, 2012)
    "Regret Minimization in Games With Incomplete Information". (Zinkevich, 2007)

SmooCT

  • Team Name: SmooCT
  • Team Members: Johannes Heinrich
  • Affiliation: University College London
  • Non-dynamic Agent
  • Technique: SmooCT was trained from self-play Monte-Carlo tree search, using Smooth UCT [2]. The resulting strategy aims to approximate a Nash equilibrium. The agent uses an imperfect recall abstraction [1] based on an equidistant discretisation of expected hand strength squared values. The abstraction uses 169, 2000, 1000 and 600 buckets for the four betting rounds respectively.

    [1] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    [2] Johannes Heinrich and David Silver. "Self-Play Monte-Carlo Tree Search in Computer Poker". To appear in 2014.

Terrible Terrance

  • Team Members: Jon Parker
  • Affiliation: Georgetown University and John Hopkins University
  • Non-dynamic Agent
  • Technique: The solution was built using sparse dynamic programming.  The "dynamic programming" portion of the solution is due to reusing precomputed "PayoffMatrix" objects and "StrategyBlock" objects.  Saving the PayoffMatrices allows the bot to accurately estimate upstream payoffs with relatively little computation.  Saving the StrategyBlock objects allows "new" Blocks to be initialized with solutions that already reflect some CFR style iterations.  No board abstraction is used because the "best" StrategyBlock to use for a betting round is found by looking up the nearest StrategyBlock given the relevant BettingSequence and BoardSequence

    My bot folds very few hands preflop (when compared against winners from prior HULHE competitions) and it also open calls a decent fraction of the time.  I am somewhat concerned that the open calling behavior is indicative of an error in the bot somewhere.  However, I have searched very hard for an error and haven't found one.  Moreover, the overall "game value" I computed is almost identical to the "game value" Eric Jackson (winner from 2012) found (I asked him in an email about this).