Participants: 2013 - Heads-up Limit Texas Hold'em

Heads-up Limit Texas Hold'em

Feste

  • Team Name: Feste
  • Team Leader: Francois Pays
  • Team Members: Francois Pays
  • Affiliation: Independent
  • Location: Paris, France
  • Technique:
    • Modelization:
      We use sequence form to compute an equilibrium of an abstract downsized game model. There is no betting abstraction. The card abstraction uses the following buckett ing parameters: preflop with 169 buckets (no abstraction), flop/400, turn/50 and river/25. Buckets are computed using K-Means over hand and board parameters: current and past expected values, deviations on future streets. Note that sequence form supposes perfect recall, which is not respected by our card abstraction. The probable consequence is some additional exploitability of the resulting strategies. Players have respectively 263,435 and 263,316 movesets and both 101,302 infosets. Model has 16,263,415 leaves, which is a relatively small for the solver.
    • Solver:
      The sequence form could be converted into a linear complementary problem (LCP). But this approach does not appear to be scalable since solving such LCP involves either sparse direct or dense methods. In order to take full advantage of the constraints and payoff matrices sparsity, we keep the original problem intact, large but sparse, and use only sparse indi rect methods. The full problem is a min-max problem with bilinear objective and separable cons traints [1]. It can be solved efficiently using a classical interior-point solve r coupled with an inner iterative linear system solver. We use a standard log-ba rrier infeasible primal-dual path-following method but applied to the min-max sy stem. The underlying newton system in augmented form belongs to the class of large spa rse saddle-point problems. Several modern techniques apply. We use a variant of the Projected Preconditioned Conjugate Gradient (PPCG) with an implicit-factoriz ation preconditioner [2]. The condition number of system matrix is kept under co ntrol using regularization and zero variables identification and elimination. Intermediary system matrices are never explicitly formed. The only significant m emory use comes from the given constraints/payoff sparse matrices. This approach has theoretical convergence rate of is O(log 1/accuracy) and uses a minimal amount of memory (proportional to the number of game leaves). In practice, the min-max problem is solved down to competition level accuracy in about 5 days using mentioned hardware. The solver has been successfully tested up to 120 million of leaves.
    • Adaptation:
      A simple variation of the initial problem leads to more aggressive strategies: in the objective function, we insert an extra fraction “epsilon” of random strategies (i.e playing every possible action with a constant probability) that both sides may additionally encounter. Lower epsilon values means closer to the optimal strategy (i.e the nash equilibrium), higher values means more aggressive strategies, trying to maximize the exploitation of random play while protecting themselves against the corresponding aggressive strategies. Feste adapts dynamically to its opponent using the Thompson Sampling algorithm over normal distributions. For a faster adaptation, element of luck from hole cards is mitigated. In the Instant Run-off tournament, Feste has at its disposal three strategies: the optimal strategy and two additional epsilon-aggressive strategies: 1% and 5%. Even if theses additional strategies are purposely away from the nash equilibrium, they are expected to be of some use against real agents, even equilibrium-based, without being too much exploitable themselves. In the Total Bankroll tournament, Feste is allowed to use an additional 25% epsilon-aggressive strategy. This strategy is a double-edged sword: it may be the best shot against "chumps" but could also be easily exploited by real agents.
    • Hardware:
      The corresponding computations has been carried out on a mid/high-range workstation (12-core Dual Xeon, 48GB of memory and 2 GPU cards). The CPU horsepower is mostly used during the probability tables generation, bucketing, payoff tables computation and match simulations. The GPU cards handle the sparse matrix-vector products (with CUDA) in the PPCG during the solve step, allowing the machine to solve two problems concurrently.
  • References and related papers:
    1. Minimax and Convex-Concave Games. Arpita Ghosh and Stephen Boyd. EE392o, Stanford University.
    2. Dollar, H. S., Gould, N. I., Schilders, W. H., & Wathen, A. J. (2006). Implicit-factorization preconditioning and iterative solvers for regularized saddle-point systems. SIAM Journal on Matrix Analysis and Applications, 28(1), 170-189.

HITSZ_CS_13

  • Team Name: HITSZ_CS_13
  • Team Leader: Xuan Wang
  • Team Members: Xuan Wang, Jiajia Zhang, Song Wu
  • Affiliation: School of Computer Science and Technology HIT
  • Location: Shenzhen, Guangdon province, China
  • Technique:
    Our program makes decision accoding to current hand strength and a set of precomputed probabilities, at the same time it tries to modeling the opponent. After the opponent model is built, the program will take advantage of the model when making decision.

Hyperborean2pl.iro

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Josh Davidson, Trevor Davis
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Hyperborean 2pl IRO is an approximation of a Nash equilibrium for a very large abstract game. The strategy was learned using Chance Sampled CFR [1]. The abstract game uses imperfect recall [2] and each round is created in two steps. First, we divide the public cards into simple categories (number of cards of a suit on the board, number of cards in a straight on the board, pair on the board, etc) with recall of the division on earlier rounds. Second, within each categy, we use k-means clustering over the recently presented Hand Strength Distribution / Earthmover Distance pre-river feature and OCHS river feature [3]. The abstraction has a perfect preflop and flop abstraction, 1521978 turn buckets, and 840000 river buckets.

    The abstraction used is identical to the 2011 and 2012 entries. The 2011 entry was run for 100 billion Chance Sampled CFR iterations, while the 2012 and 2013 entries were run for 130 billion Chance Sampled CFR iterations.
  • References and related papers:
    1. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.
    2. Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    3. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. "Evaluating State-Space Abstractions in Extensive-Form Games". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

Hyperborean2pl.tbr

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Josh Davidson, Trevor Davis
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Hyperborean is an implicit modelling agent [5] consisting of four data biased response strategies to specific agents seen in the 2010 and 2011 ACPC's heads-up limit events. All four strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [1] with imperfect recall abstractions. Buckets were calculated according to public card textures and k-means clustering over hand strength distributions [6] and yielded an imperfect recall abstract game by forgetting previous card information and rebucketing on every round [3]. Agents were run for 4 billion iterations of chance sampled CFR. The abstraction uses 169, 18630, 18630, and 18630 buckets on each round of the game, respectively, for a total of 118 million information sets.

    A mixture of these strategies is dynamically generated using a slightly modified Exp4-like algorithm [4] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [2].
  • References and related papers:
    1. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.
    2. Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. "Strategy Evaluation in Extensive Games with Importance Sampling". In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.
    3. Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    4. P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. "Gambling in a rigged casino: The adversarial multi-armed bandit problem". Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.
    5. Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. "Online Implicit Agent Modelling". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.
    6. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. "Evaluating State-Space Abstractions in Extensive-Form Games". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

LIACC

  • Team Name:LIACC
  • Team Leader: Luis Filipe Teofilo
  • Team Members: Luis Filipe Teofilo
  • Affiliation: University of Porto, Artificial Intelligence and Computer Science Laboratory
  • Location: Porto, Portugal
  • Technique: Expected value maximization with game partition

Little Rock

  • Team Name: Little Rock
  • Team Leader: Rod Byrnes
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Goonellabah, NSW, Australia
  • Technique:
    Little Rock uses an external sampling monte carlo CFR approach with imperfect recall. All agents in this year's competition use the same card abstraction, which has 8192 buckets on each of the flop, turn and river, which are created by clustering all possible hands using a variety of metrics from the current and previous rounds. The 2 player limit agent uses no action abstrations. The other two agents use what I call a "cross-sectional" approach which abstracts aspects of the current game state rather than translating individual actions (which is what I call a "longitudinal" approach).
  • References and related papers:
    1. Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078-1086, 2009.

Marv

  • Team Name: Bacalhau
  • Team Leader: Marv Andersen
  • Team Members: Marv Andersen
  • Affiliation: Independent
  • Location: London, UK
  • Technique:
    This bot is a neural net trained to imitate the play of previous ACPC winners.

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location:
  • Technique:
    The bot was built using proprietary universal game theory methods applied to poker. We complete Fixed Limit Hold’em game tree search without approximation. Original AI utilizes own database of about 3TB and to comply with completion format our team provided special simplified version of Neo - Neopokerbot_FL2V.

ProPokerTools

  • Team Name: ProPokerTools
  • Team Leader: Dan Hutchings
  • Team Members: Dan Hutchings
  • Affiliation: ProPokerTools
  • Location: Lakewood, Colorado.
  • Technique:
    This hulhe agent was created using established methods; regret minimization, partial recall, etc. etc. My goal is to develop a suite of training tools built around strong approximations of unexploitable strategies. I have started with heads up limit holdem as a test case, as it is the game with the largest body of research; I fully intend to move onto other games and already have some promising results.
    I have given myself a constraint in building my AI agents; all agents are created on a single machine that costs less than $1,000. I do not have high hopes for victory; I am primarily interested in how far from the best-of-the-best I can get using established methods on commodity hardware. "Pretty close" is more than good enough for my purposes. I have not attempted to optimize my AI agents for competition play.

Slugathorus

  • Team Name: Slugathorus
  • Team Leader: Daniel Berger
  • Team Members: Daniel Berger
  • Affiliation: University of New South Wales
  • Location: Sydney, Australia
  • Technique:
    The agent plays an approximate Nash Equilibrium strategy generated by public chance sampled MCCFR over an abstraction with 2 billion information sets.
  • References and related papers:
    1. "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization". (Johanson, 2012)
    2. "Regret Minimization in Games With Incomplete Information". (Zinkevich, 2007)

UNamur / Joshua

  • Team Name: University of Namur
  • Team Leader: Nicolas Verbeeren
  • Team Members: Nicolas Verbeeren
  • Affiliation: University of Namur
  • Location: Namur, Namur, Belgium
  • Technique:
    Joshua is based on a maximum entropy probabilistic model.

ZBot

  • Team Name: ZBot
  • Team Leader: Ilkka Rajala
  • Team Members: Ilkka Rajala
  • Affiliation: Independent
  • Location: Helsinki, Finland
  • Technique:
    Counterfactual regret minimization implementation that uses two phases. In the first phase the model is built dynamically by expanding it (observing more buckets) in situations which are visited more often, until the desired size has been reached. In the second phase that model is then solved by counterfactual regret minimization.

    Basically the same as ZBot 2012, only much bigger.