Participants: 2012 - Heads-up Limit Texas Hold'em

Heads-up Limit Texas Hold'em

Entropy

  • Team Name: ERGOD
  • Team Leader: Ken Barry
  • Team Members: Ken Barry
  • Affiliation: ERGOD
  • Location: Athlone, Westmeath, Ireland
  • Technique:
  • Entropy is powered by "ExperienceEngine", an agent capable of acting intelligently in any indeterminate system. Development of ExperienceEngine is ongoing and its inner workings cannot be revealed at this time.

Feste

  • Team Name: Feste
  • Team Leader: Franois Pays
  • Team Members: Franois Pays
  • Affiliation: Independent
  • Location: Paris, France
  • Technique:
  • The 2-player limit game is modelized using sequence form and solved as a min-max problem using conventional interior-point method. Betting structure is kept intact with no loss of information but cards information states are aggregated in clusters depending of betting round (flop, turn and river). The min-max problem is solved using a convex-concave variant of the log-barrier patch-following interior-point. The inner newton system is a large sparse saddle point system. Using adhoc krylov method along with preconditioning, such the system is tractable with consummer hardware. As the solution approaches, the system gets more and more ill-conditioned. Several techniques are used to stabilize the krylov solver, dynamic precision control, variable elimination and regularization. Required accuracy is reached in about 250 iterations.

Huhuers

  • Team Name: Huhubot
  • Team Leader: Shawne Lo
  • Team Members: Shawne Lo, Wes Ren Tong
  • Affiliation: Independent
  • Location: Toronto, Canada
  • Technique:
    Case based reasoning through imitation of proven strong agents.

Hyperborean2p.iro

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    The 2-player instant run-off program is built using the Public Chance Sampling (PCS) [1] variant of Counterfactual Regret Minimization [2]. We solve a large abstract game, identical to Texas Hold'em in the preflop and flop. On the turn and river, we bucket the hands and public cards together, using approximately 1.5 million categories on the turn and 900 thousand categories on the river.
  • References and related papers:
    • Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling. "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization" In AAMAS 2012
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.

Hyperborean2p.tbr

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Hyperborean-2012-2p-limit-tbr is an agent consisting of seven abstract strategies. All seven strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [1] with imperfect recall abstractions [3]. They are:

    Two strategies in an imperfect recall abstraction using 57 million information sets that specifically counter opponents who always raise or always call. An approximation of an equilibrium within a large imperfect recall abstraction that has 879,586,352 information sets, with an unabstracted, perfect recall preflop and flop. Four strategies in the smaller (57 million information sets) abstraction that are responses to models of particular opponents seen in the 2010 or 2011 ACPC.

    During a match, the counterstrategies to always raise and always call will only be used if the opponent is detected to be always raise or always call. Otherwise, a mixture of the remaining five strategies is used. The mixture is generated using a slightly modified Hedge algorithm [4] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [2].
  • References and related papers:
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.
    • Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. "Strategy Evaluation in Extensive Games with Importance Sampling". In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.
    • Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    • P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. "Gambling in a rigged casino: The adversarial multi-armed bandit problem". Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

LittleAce

  • Team Name: LittleAce
  • Team Leader:
  • Team Members:
  • Affiliation:
  • Location:
  • Technique:

LittleRock

  • Team Name: LittleRock
  • Team Leader: Rod Byrnes
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Lismore, Australia
  • Technique:
    LittleRock uses an external sampling monte carlo CFR approach with imperfect recall. Additional RAM was available for training the agent entered into this year's competition, which allowed for a more fine grained card abstraction, but the algorithm is otherwise largely unchanged. One last-minute addition this year is a no-limit agent.

    The no-limit agent has 4,491,849 information sets, the heads-up limit agent has 11,349,052 information sets and the limit 3-player agent has 47,574,530 information sets. In addition to card abstractions, the 3-player and no-limit agents also use a form of state abstraction to make the game size manageable.
  • References and related papers:
    • Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 10781086, 2009.

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location: Spain
  • Technique:
    Our range of computer players was developed to play against humans. The AI was trained on top poker rooms real money hand history logs. The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from top players in different games of poker. Our computer players have been tested against humans and demonstrated great results over 100 mln hands. The AI was not optimized to play against computer players.

Patience

  • Team Name: Patience
  • Team Leader: Nick Grozny
  • Team Members: Nick Grozny
  • Affiliation: Independent
  • Location: Moscow, Russia.
  • Technique:
    Patience uses a static strategy built by the fictitious play algorithm.

 

Sartre

  • Team Name: Sartre
  • Team Leader: Jonathan Rubin
  • Team Members: Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
    Sartre uses a case-based approach to play Texas Hold'em. AAAI hand history data from multiple agents are encoded into distinct case-bases. When it is time for Sartre to make a betting decision a case with the current game state information is created. Each individual case-base is then searched for similar scenarios resulting in a collection of playing decisions. A final decision is made via ensemble voting.
  • References and related papers:
    • Jonathan Rubin and Ian Watson. Case-Based Strategies in Computer Poker, AI Communications, Volume 25, Number 1: 19-48, March 2012.
    • Jonathan Rubin and Ian Watson. (2011). On Combining Decisions from Multiple Expert Imitators for Performance. In IJCAI-11, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.

Slumbot

  • Team Name: Slumbot
  • Team Leader: Eric Jackson
  • Team Members: Eric Jackson
  • Affiliation: Independent
  • Location: Menlo Park, CA, USA
  • Technique:
    Slumbot employs the Public Chance Sampling variant of Counterfactual Regret Minimization. We use a large abstraction with 88 billion information sets. There is no abstraction on any street prior to the river. On the river there are about 4.7 million bins.

    As a consequence of the large abstraction size and our relatively modest compute environment, our system is disk-based - regrets and accumulated probabilities are written to disk on each iteration.
  • References and related papers:
    • [Johanson 2012] Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
    • [Johanson 2011] Accelerating Best Response Calculation in Large Extensive Games
    • [Zinkevich 2007] Regret Minimization in Games with Incomplete Information

ZBot

  • Team Name: ZBot
  • Team Leader: Ilkka Rajala
  • Team Members: Ilkka Rajala
  • Affiliation: Independent
  • Location: Helsinki, Finland
  • Technique:
    Counterfactual regret minimization implementation that uses two phases. In the first phase the model is built dynamically by expanding it (observing more buckets) in situations which are visited more often, until the desired size has been reached.
    In the second phase that model is then solved by counterfactual regret minimization.

    Model has 1024 possible board texture buckets for each street, and 169/1024/512/512 hand type buckets for preflop/flop/turn/river. How many buckets are actually used in any given situation depends on how common that situation is.