Participants: 2012 - Heads-up No-limit Texas Hold'em

Heads-up No-limit Texas Hold'em

Azure Sky

  • Team Name: Azure Sky Research, Inc
  • Team Leader: Eric Baum
  • Team Members: Eric Baum, Chick Markley, Dennis Horte
  • Affiliation: Azure Sky Research Inc.
  • Location: Berkeley CA US
  • Technique:
    SARSA trained neural nets, k-armed bandits, secret sauce.

dcubot

  • Team Name: dcubot
  • Team Leaders: Neill Sweeney
  • Team Members: Neill Sweeney, David Sinclair
  • Affiliation: School of Computing, Dublin City University
  • Location: Dublin 9, Ireland.
  • Technique:
    The bot uses a structure like a Neural Net to generate its own actions. A hidden Markov model is used to interpret actions i.e. read an opponent's hand. The whole system is then trained by self-play.
    For any decision, the range of betting between a min-bet and all-in is divided into at most twelve sub-ranges. The structure then selects a fold,call, min-bet, all-in or one of these sub-ranges. If a sub-range is selected, the actual raise ammount is drawn from a quadratic distribution between the end-points of the sub-range. The end-points of the sub-ranges are learnt using the same reinfrocement learning algorthm as the rest of the structure.

hugh

  • Team Name: hugh
  • Team Leader: Stan Sulsky
  • Team Members: Stan Sulsky, Ben Sulsky
  • Affiliation: Independent
  • Location: NY, US & Toronto, Ont, CA
  • Technique:
    Ben (poker player and son) attempts to teach Stan (programmer and father) to play poker. Stan attempts to realize Ben's ideas in code.

    More specifically, pure strategies are utilized throughout. Play is based on range vs range ev calculations. PreFlop ranges are deduced by opponent modelling during play. Subsequent decisions are based a minmax search of the remaining game tree, coupled
    with some tactical considerations.

Hyperborean2pNL

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, Johnny Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Our 2-player no limit bot was built using a variant of Counterfactual Regret Minimization (CFR) ([3], [4]) applied to a specially designed betting abstraction of the game. Using an algorithm similar to the CFR algorithm, a different bet size is chosen for each betting sequence in the game ([1], [2]). The card abstraction used buckets hands and public cards together using imperfect recall, allowing for 18630 possible buckets on each of the flop, turn and river.
  • References and related papers:
    • Hawkin, J.; Holte, R.; and Szafron, D. 2011. Automated action abstraction of imperfect information extensive-form games. In AAAI, 681687.
    • Hawkin, J.; Holte, R.; and Szafron, D. 2012. Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games. To appear, AAAI '12.
    • Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling. "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization" In AAMAS 2012
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.

LittleRock

  • Team Name: LittleRock
  • Team Leader: LittleRock
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Lismore, Australia
  • Technique:
    LittleRock uses an external sampling monte carlo CFR approach with imperfect recall. Additional RAM was available for training the agent entered into this year's competition, which allowed for a more fine grained card abstraction, but the algorithm is otherwise largely unchanged. One last-minute addition this year is a no-limit agent.

    The no-limit agent has 4,491,849 information sets, the heads-up limit agent has 11,349,052 information sets and the limit 3-player agent has 47,574,530 information sets. In addition to card abstractions, the 3-player and no-limit agents also use a form of state abstraction to make the game size manageable.
  • References and related papers:
    • Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 10781086, 2009.

Lucky7_12

  • Team Name: Lucky7_12
  • Team Leader: Bojan Butolen
  • Team Members: Bojan Butolen, Gregor Vohl
  • Affiliation: University of Maribor
  • Location: Maribor, Slovenia
  • Technique:
  • We have developed a multi agent system that uses 8 strategies during gameplay. By identifying the state of the game, our system chooses a set of strategies that have proved most profitable against a set of training agents. The final decision of the system is made by averaging the decisions of the individual agents.

    The 8 agents included in our system are most rule-based agent. The rules for each individual agent were constructed using different knowledge bases (various match logs, expert knowledge, human observed play...) and different abstraction definitions for cards and
    actions. After a set of test matches were each agent dueled against the other agents in system, we determined that none of the included agents present an inferior or superior strategy (meaning each agent lost at least against one of the other agents and won at least one match).
  • References and related papers:
    • A submission to the Poker Symposium was made with the title: Combining Various Strategies In A Poker Playing Multi Agent System

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location: Spain
  • Technique:
    Our range of computer players was developed to play against humans. The AI was trained on top poker rooms real money hand history logs. The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from top players in different games of poker.
    Our computer players have been tested against humans and demonstrated great results over 100 mln hands. The AI was not optimized to play against computer players.

SartreNL

  • Team Name: Sartre
  • Team Leader: Jonathan Rubin
  • Team Members: Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
    SartreNL uses a case-based approach to play No Limit Texas Hold'em. Hand history data from the previous years top agents are encoded into cases. When it is time for SartreNL to make a betting decision a case with the current game state information is created. The case-base is then searched for similar cases. The solution to past similar cases are then re-used for the current situation.
  • References and related papers:
    • Jonathan Rubin and Ian Watson. (2011). Successful Performance via Decision Generalisation in No Limit Texas Hold'em. In Case-Based Reasoning. Research and Development, 19th International Conference on Case-Based Reasoning, ICCBR 2011.

Spewie Louie

  • Team Name: Spewie Louie
  • Team Leader: Jon Parker
  • Team Members: Jon Parker
  • Affiliation: Georgetown University
  • Location: Washington DC, USA
  • Technique:
    The bot assumes bets can occur in: .25x, .4286x, .6666x, 1x, 1.5x, 4x, and 9x pot increments. Nodes in the tree contain: A hand range for each player, an "effectiveMatrix" that summerizes the tree below that point in the tree, and a "strategyMatrix" which is used by the "hero" of that node. Prior to the competition a collection of 24 Million matrices (1/2 strategy and 1/2 effective) were refined while simulating roughly 12.5 Million paths through the tree. This set of 24 Million matrices is then trimmed down to 770k (strategy only) matrices for the competition. Any decision not supported by this set of matrices is handled by an "on line" tree learned.
    During the learning process the set of effectiveMatrices and strategy matrices are stored in a ConcurrentHashMap. This gives the learning process good multi-thread behavior.
    Preflop hands are bucketed into 22 groups. Flop and Turn hands are bucketed into 8 groups. River hands are bucketed into 7 groups.
  • References and related papers:
    • Micheal Johanson's Masters thesis was quite helpful. "Robust Strategies and Counter-Strategies: Build a Champion Level Computer Poker Player". As were most of his other paper. Some of the older U. Alberta works by Darse Billings were also good reads. The book "The Mathematics of Poker" and its explaination of the AKQ game is very good.

Tartanian5

  • Team Name: Tartanian5
  • Team Leader: Sam Ganzfried
  • Team Members: Sam Ganzfried, Tuomas Sandholm
  • Affiliation: Carnegie Mellon University
  • Location: Pittsburgh, PA, 15217, United States
  • Technique:
    Tartanian5 plays a game-theoretic approximate Nash equilibrium strategy. First, it applies a potential-aware, perfect-recall, automated abstraction algorithm to group similar game states together and construct a smaller game that is strategically similar to the full game. In order to maintain a tractable number of possible betting sequences, it employs a discretized betting model, where only a small number of bet sizes are allowed at each game state. Approximate equilibrium strategies for both players are then computed using an improved version of Nesterov's excessive gap technique specialized for poker. To obtain the final strategies, we apply a purification procedure which rounds action probabilities to 0 or 1.
  • References and related papers:
    • Sam Ganzfried, Tuomas Sandholm, and Kevin Waugh. 2012. Strategy purification and thresholding: Effective non-equilibrium approaches for playing large games. In AAMAS.
    • Andrew Gilpin, Tuomas Sandholm, and Troels Sorensen. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold'em poker. In AAAI.
    • Andrew Gilpin, Tuomas Sandholm, and Troels Sorensen. 2008. A heads-up no-limit Texas Hold'em poker player: Discretized betting models and automatically generated equilibrium-finding programs. In AAMAS.
    • Samid Hoda, Andrew Gilpin, Javier Pena, and Tuomas Sandholm. 2010. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research 35(2):494-512.

UNI-MB_Poker

  • Team Name: UNI-MB_Poker
  • Team Leader: Ale ?ep
  • Team Members: Ale ?ep, Davor Gaberek
  • Affiliation: University of Maribor
  • Location: Maribor, Slovenia
  • Technique:
  • Our Poker-agent concentrates on getting chips from his opponent to maximize its profit. It uses small raises even if it has good cards to lour his opponent into the game, bluffs in 5% of hands and folds, when odds are not in its favor. We used two criteria for our agent to decide what to do - first we examine the cards that we get and secondly we calculate the odds of us winning. After combining the two results we decide what action to take.