Participants: 2012

The 2012 competition had 13 different agents in the heads-up limit Texas hold'em competition, 11 agents in the heads-up no-limit competition, and 5 agents in the 3-player limit competition. As in previous years, agents were submitted by a mixture of universities and individual hobbyists from 10 different countries around the world.

Competitors in the 2012 Annual Computer Poker Competition were not required to supply detailed information about their submission(s) in order to compete, but some information about team members, affiliation, location, high level technique descriptions, and occasionally relevant papers were supplied. This page presents that information.


Heads-up Limit Texas Hold'em

Entropy

  • Team Name: ERGOD
  • Team Leader: Ken Barry
  • Team Members: Ken Barry
  • Affiliation: ERGOD
  • Location: Athlone, Westmeath, Ireland
  • Technique:
  • Entropy is powered by "ExperienceEngine", an agent capable of acting intelligently in any indeterminate system. Development of ExperienceEngine is ongoing and its inner workings cannot be revealed at this time.

Feste

  • Team Name: Feste
  • Team Leader: Franois Pays
  • Team Members: Franois Pays
  • Affiliation: Independent
  • Location: Paris, France
  • Technique:
  • The 2-player limit game is modelized using sequence form and solved as a min-max problem using conventional interior-point method. Betting structure is kept intact with no loss of information but cards information states are aggregated in clusters depending of betting round (flop, turn and river). The min-max problem is solved using a convex-concave variant of the log-barrier patch-following interior-point. The inner newton system is a large sparse saddle point system. Using adhoc krylov method along with preconditioning, such the system is tractable with consummer hardware. As the solution approaches, the system gets more and more ill-conditioned. Several techniques are used to stabilize the krylov solver, dynamic precision control, variable elimination and regularization. Required accuracy is reached in about 250 iterations.

Huhuers

  • Team Name: Huhubot
  • Team Leader: Shawne Lo
  • Team Members: Shawne Lo, Wes Ren Tong
  • Affiliation: Independent
  • Location: Toronto, Canada
  • Technique:
    Case based reasoning through imitation of proven strong agents.

Hyperborean2p.iro

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    The 2-player instant run-off program is built using the Public Chance Sampling (PCS) [1] variant of Counterfactual Regret Minimization [2]. We solve a large abstract game, identical to Texas Hold'em in the preflop and flop. On the turn and river, we bucket the hands and public cards together, using approximately 1.5 million categories on the turn and 900 thousand categories on the river.
  • References and related papers:
    • Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling. "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization" In AAMAS 2012
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.

Hyperborean2p.tbr

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Hyperborean-2012-2p-limit-tbr is an agent consisting of seven abstract strategies. All seven strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [1] with imperfect recall abstractions [3]. They are:

    Two strategies in an imperfect recall abstraction using 57 million information sets that specifically counter opponents who always raise or always call. An approximation of an equilibrium within a large imperfect recall abstraction that has 879,586,352 information sets, with an unabstracted, perfect recall preflop and flop. Four strategies in the smaller (57 million information sets) abstraction that are responses to models of particular opponents seen in the 2010 or 2011 ACPC.

    During a match, the counterstrategies to always raise and always call will only be used if the opponent is detected to be always raise or always call. Otherwise, a mixture of the remaining five strategies is used. The mixture is generated using a slightly modified Hedge algorithm [4] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [2].
  • References and related papers:
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.
    • Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. "Strategy Evaluation in Extensive Games with Importance Sampling". In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.
    • Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    • P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. "Gambling in a rigged casino: The adversarial multi-armed bandit problem". Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

LittleAce

  • Team Name: LittleAce
  • Team Leader:
  • Team Members:
  • Affiliation:
  • Location:
  • Technique:

LittleRock

  • Team Name: LittleRock
  • Team Leader: Rod Byrnes
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Lismore, Australia
  • Technique:
    LittleRock uses an external sampling monte carlo CFR approach with imperfect recall. Additional RAM was available for training the agent entered into this year's competition, which allowed for a more fine grained card abstraction, but the algorithm is otherwise largely unchanged. One last-minute addition this year is a no-limit agent.

    The no-limit agent has 4,491,849 information sets, the heads-up limit agent has 11,349,052 information sets and the limit 3-player agent has 47,574,530 information sets. In addition to card abstractions, the 3-player and no-limit agents also use a form of state abstraction to make the game size manageable.
  • References and related papers:
    • Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 10781086, 2009.

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location: Spain
  • Technique:
    Our range of computer players was developed to play against humans. The AI was trained on top poker rooms real money hand history logs. The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from top players in different games of poker. Our computer players have been tested against humans and demonstrated great results over 100 mln hands. The AI was not optimized to play against computer players.

Patience

  • Team Name: Patience
  • Team Leader: Nick Grozny
  • Team Members: Nick Grozny
  • Affiliation: Independent
  • Location: Moscow, Russia.
  • Technique:
    Patience uses a static strategy built by the fictitious play algorithm.

 

Sartre

  • Team Name: Sartre
  • Team Leader: Jonathan Rubin
  • Team Members: Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
    Sartre uses a case-based approach to play Texas Hold'em. AAAI hand history data from multiple agents are encoded into distinct case-bases. When it is time for Sartre to make a betting decision a case with the current game state information is created. Each individual case-base is then searched for similar scenarios resulting in a collection of playing decisions. A final decision is made via ensemble voting.
  • References and related papers:
    • Jonathan Rubin and Ian Watson. Case-Based Strategies in Computer Poker, AI Communications, Volume 25, Number 1: 19-48, March 2012.
    • Jonathan Rubin and Ian Watson. (2011). On Combining Decisions from Multiple Expert Imitators for Performance. In IJCAI-11, Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.

Slumbot

  • Team Name: Slumbot
  • Team Leader: Eric Jackson
  • Team Members: Eric Jackson
  • Affiliation: Independent
  • Location: Menlo Park, CA, USA
  • Technique:
    Slumbot employs the Public Chance Sampling variant of Counterfactual Regret Minimization. We use a large abstraction with 88 billion information sets. There is no abstraction on any street prior to the river. On the river there are about 4.7 million bins.

    As a consequence of the large abstraction size and our relatively modest compute environment, our system is disk-based - regrets and accumulated probabilities are written to disk on each iteration.
  • References and related papers:
    • [Johanson 2012] Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization
    • [Johanson 2011] Accelerating Best Response Calculation in Large Extensive Games
    • [Zinkevich 2007] Regret Minimization in Games with Incomplete Information

ZBot

  • Team Name: ZBot
  • Team Leader: Ilkka Rajala
  • Team Members: Ilkka Rajala
  • Affiliation: Independent
  • Location: Helsinki, Finland
  • Technique:
    Counterfactual regret minimization implementation that uses two phases. In the first phase the model is built dynamically by expanding it (observing more buckets) in situations which are visited more often, until the desired size has been reached.
    In the second phase that model is then solved by counterfactual regret minimization.

    Model has 1024 possible board texture buckets for each street, and 169/1024/512/512 hand type buckets for preflop/flop/turn/river. How many buckets are actually used in any given situation depends on how common that situation is.

Heads-up No-limit Texas Hold'em

Azure Sky

  • Team Name: Azure Sky Research, Inc
  • Team Leader: Eric Baum
  • Team Members: Eric Baum, Chick Markley, Dennis Horte
  • Affiliation: Azure Sky Research Inc.
  • Location: Berkeley CA US
  • Technique:
    SARSA trained neural nets, k-armed bandits, secret sauce.

dcubot

  • Team Name: dcubot
  • Team Leaders: Neill Sweeney
  • Team Members: Neill Sweeney, David Sinclair
  • Affiliation: School of Computing, Dublin City University
  • Location: Dublin 9, Ireland.
  • Technique:
    The bot uses a structure like a Neural Net to generate its own actions. A hidden Markov model is used to interpret actions i.e. read an opponent's hand. The whole system is then trained by self-play.
    For any decision, the range of betting between a min-bet and all-in is divided into at most twelve sub-ranges. The structure then selects a fold,call, min-bet, all-in or one of these sub-ranges. If a sub-range is selected, the actual raise ammount is drawn from a quadratic distribution between the end-points of the sub-range. The end-points of the sub-ranges are learnt using the same reinfrocement learning algorthm as the rest of the structure.

hugh

  • Team Name: hugh
  • Team Leader: Stan Sulsky
  • Team Members: Stan Sulsky, Ben Sulsky
  • Affiliation: Independent
  • Location: NY, US & Toronto, Ont, CA
  • Technique:
    Ben (poker player and son) attempts to teach Stan (programmer and father) to play poker. Stan attempts to realize Ben's ideas in code.

    More specifically, pure strategies are utilized throughout. Play is based on range vs range ev calculations. PreFlop ranges are deduced by opponent modelling during play. Subsequent decisions are based a minmax search of the remaining game tree, coupled
    with some tactical considerations.

Hyperborean2pNL

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, Johnny Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Our 2-player no limit bot was built using a variant of Counterfactual Regret Minimization (CFR) ([3], [4]) applied to a specially designed betting abstraction of the game. Using an algorithm similar to the CFR algorithm, a different bet size is chosen for each betting sequence in the game ([1], [2]). The card abstraction used buckets hands and public cards together using imperfect recall, allowing for 18630 possible buckets on each of the flop, turn and river.
  • References and related papers:
    • Hawkin, J.; Holte, R.; and Szafron, D. 2011. Automated action abstraction of imperfect information extensive-form games. In AAAI, 681687.
    • Hawkin, J.; Holte, R.; and Szafron, D. 2012. Using Sliding Windows to Generate Action Abstractions in Extensive-Form Games. To appear, AAAI '12.
    • Michael Johanson, Nolan Bard, Marc Lanctot, Richard Gibson, and Michael Bowling. "Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization" In AAMAS 2012
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.

LittleRock

  • Team Name: LittleRock
  • Team Leader: LittleRock
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Lismore, Australia
  • Technique:
    LittleRock uses an external sampling monte carlo CFR approach with imperfect recall. Additional RAM was available for training the agent entered into this year's competition, which allowed for a more fine grained card abstraction, but the algorithm is otherwise largely unchanged. One last-minute addition this year is a no-limit agent.

    The no-limit agent has 4,491,849 information sets, the heads-up limit agent has 11,349,052 information sets and the limit 3-player agent has 47,574,530 information sets. In addition to card abstractions, the 3-player and no-limit agents also use a form of state abstraction to make the game size manageable.
  • References and related papers:
    • Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 10781086, 2009.

Lucky7_12

  • Team Name: Lucky7_12
  • Team Leader: Bojan Butolen
  • Team Members: Bojan Butolen, Gregor Vohl
  • Affiliation: University of Maribor
  • Location: Maribor, Slovenia
  • Technique:
  • We have developed a multi agent system that uses 8 strategies during gameplay. By identifying the state of the game, our system chooses a set of strategies that have proved most profitable against a set of training agents. The final decision of the system is made by averaging the decisions of the individual agents.

    The 8 agents included in our system are most rule-based agent. The rules for each individual agent were constructed using different knowledge bases (various match logs, expert knowledge, human observed play...) and different abstraction definitions for cards and
    actions. After a set of test matches were each agent dueled against the other agents in system, we determined that none of the included agents present an inferior or superior strategy (meaning each agent lost at least against one of the other agents and won at least one match).
  • References and related papers:
    • A submission to the Poker Symposium was made with the title: Combining Various Strategies In A Poker Playing Multi Agent System

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location: Spain
  • Technique:
    Our range of computer players was developed to play against humans. The AI was trained on top poker rooms real money hand history logs. The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from top players in different games of poker.
    Our computer players have been tested against humans and demonstrated great results over 100 mln hands. The AI was not optimized to play against computer players.

SartreNL

  • Team Name: Sartre
  • Team Leader: Jonathan Rubin
  • Team Members: Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
    SartreNL uses a case-based approach to play No Limit Texas Hold'em. Hand history data from the previous years top agents are encoded into cases. When it is time for SartreNL to make a betting decision a case with the current game state information is created. The case-base is then searched for similar cases. The solution to past similar cases are then re-used for the current situation.
  • References and related papers:
    • Jonathan Rubin and Ian Watson. (2011). Successful Performance via Decision Generalisation in No Limit Texas Hold'em. In Case-Based Reasoning. Research and Development, 19th International Conference on Case-Based Reasoning, ICCBR 2011.

Spewie Louie

  • Team Name: Spewie Louie
  • Team Leader: Jon Parker
  • Team Members: Jon Parker
  • Affiliation: Georgetown University
  • Location: Washington DC, USA
  • Technique:
    The bot assumes bets can occur in: .25x, .4286x, .6666x, 1x, 1.5x, 4x, and 9x pot increments. Nodes in the tree contain: A hand range for each player, an "effectiveMatrix" that summerizes the tree below that point in the tree, and a "strategyMatrix" which is used by the "hero" of that node. Prior to the competition a collection of 24 Million matrices (1/2 strategy and 1/2 effective) were refined while simulating roughly 12.5 Million paths through the tree. This set of 24 Million matrices is then trimmed down to 770k (strategy only) matrices for the competition. Any decision not supported by this set of matrices is handled by an "on line" tree learned.
    During the learning process the set of effectiveMatrices and strategy matrices are stored in a ConcurrentHashMap. This gives the learning process good multi-thread behavior.
    Preflop hands are bucketed into 22 groups. Flop and Turn hands are bucketed into 8 groups. River hands are bucketed into 7 groups.
  • References and related papers:
    • Micheal Johanson's Masters thesis was quite helpful. "Robust Strategies and Counter-Strategies: Build a Champion Level Computer Poker Player". As were most of his other paper. Some of the older U. Alberta works by Darse Billings were also good reads. The book "The Mathematics of Poker" and its explaination of the AKQ game is very good.

Tartanian5

  • Team Name: Tartanian5
  • Team Leader: Sam Ganzfried
  • Team Members: Sam Ganzfried, Tuomas Sandholm
  • Affiliation: Carnegie Mellon University
  • Location: Pittsburgh, PA, 15217, United States
  • Technique:
    Tartanian5 plays a game-theoretic approximate Nash equilibrium strategy. First, it applies a potential-aware, perfect-recall, automated abstraction algorithm to group similar game states together and construct a smaller game that is strategically similar to the full game. In order to maintain a tractable number of possible betting sequences, it employs a discretized betting model, where only a small number of bet sizes are allowed at each game state. Approximate equilibrium strategies for both players are then computed using an improved version of Nesterov's excessive gap technique specialized for poker. To obtain the final strategies, we apply a purification procedure which rounds action probabilities to 0 or 1.
  • References and related papers:
    • Sam Ganzfried, Tuomas Sandholm, and Kevin Waugh. 2012. Strategy purification and thresholding: Effective non-equilibrium approaches for playing large games. In AAMAS.
    • Andrew Gilpin, Tuomas Sandholm, and Troels Sorensen. 2007. Potential-aware automated abstraction of sequential games, and holistic equilibrium analysis of Texas Hold'em poker. In AAAI.
    • Andrew Gilpin, Tuomas Sandholm, and Troels Sorensen. 2008. A heads-up no-limit Texas Hold'em poker player: Discretized betting models and automatically generated equilibrium-finding programs. In AAMAS.
    • Samid Hoda, Andrew Gilpin, Javier Pena, and Tuomas Sandholm. 2010. Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research 35(2):494-512.

UNI-MB_Poker

  • Team Name: UNI-MB_Poker
  • Team Leader: Ale ?ep
  • Team Members: Ale ?ep, Davor Gaberek
  • Affiliation: University of Maribor
  • Location: Maribor, Slovenia
  • Technique:
  • Our Poker-agent concentrates on getting chips from his opponent to maximize its profit. It uses small raises even if it has good cards to lour his opponent into the game, bluffs in 5% of hands and folds, when odds are not in its favor. We used two criteria for our agent to decide what to do - first we examine the cards that we get and secondly we calculate the odds of us winning. After combining the two results we decide what action to take.

     


3-player Limit Texas Hold'em

dcubot

  • Team Name: dcubot
  • Team Leader: Neill Sweeney
  • Team Members: Neill Sweeney, David Sinclair
  • Affiliation: School of Computing, Dublin City University
  • Location: Dublin 9, Ireland.
  • Technique:
    The bot uses 4 seperate connectionist strutures for each betting round. Ten input features describe the state of the betting after each legal decision and there are over 300 basic features describing the visible cards. Reading opponent hands is dealt with by maximum likelihood fitting a hidden markov model to the play with the cards hidden. A belief vector over the hidden variable is then used as an additional input.

    This year we have increased the size of the structure by doubling the hidden layer.

Hyperborean3p

  • Team Name: Hyperborean3p
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, Johnny Hawkin, Richard Gibson, Neil Burch, Parisa Mazrooei, Josh Davidson
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Our 3-player program is built using the External Sampling (ES) [2] variant of Counterfactual Regret Minimization [3]. ES is applied to an abstract game constructed from two different card abstractions of Texas Hold'em, producing a dynamic expert strategy [1]. The first card abstraction is a very fine and allows our program to distinguish between many different possible hands on each round, whereas our second card abstraction is much coarser and merges many different hands into the same information set. The first abstraction is applied to the "important" parts of the betting tree, where importance is determined by the potsize and the frequency at which our program reached the betting sequence in last year's competition. The second, coarser abstraction is applied elsewhere.
  • References and related papers:
    • Richard Gibson and Duane Szafron. On strategy stitching in large extensive form multiplayer games. In NIPS 2011..
    • Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo sampling for regret minimization in extensive games. In NIPS 2009.
    • Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In NIPS 2008.

LittleRock

  • Team Name: LittleRock
  • Team Leader: Rod Byrnes
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Lismore, Australia
  • Technique:
    LittleRock uses an external sampling monte carlo CFR approach with imperfect recall. Additional RAM was available for training the agent entered into this year's competition, which allowed for a more fine grained card abstraction, but the algorithm is otherwise largely unchanged. One last-minute addition this year is a no-limit agent.

    The no-limit agent has 4,491,849 information sets, the heads-up limit agent has 11,349,052 information sets and the limit 3-player agent has 47,574,530 information sets. In addition to card abstractions, the 3-player and no-limit agents also use a form of state abstraction to make the game size manageable.
  • References and related papers:
    • Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 10781086, 2009.

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location: Spain
  • Technique:
    Our range of computer players was developed to play against humans. The AI was trained on top poker rooms real money hand history logs. The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from top players in different games of poker.
    Our computer players have been tested against humans and demonstrated great results over 100 mln hands. The AI was not optimized to play against computer players.

 

Sartre3P

  • Team Name: Sartre
  • Team Leader: Jonathan Rubin
  • Team Members: Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
    Sartre3P uses a case-based approach to play Texas Hold'em. AAAI hand history data from both three-player and two-player matches are encoded into separate case-bases. When a playing decision is required, a case with the current game state information is created. If no opponents have folded, Sartre3P will search the three-player case-base for similar game scenarios for a solution. On the other hand, if an opponent has folded, Sartre3P will search the two-player case-base and switch to a heads-up strategy if it is possible to map the three-player betting sequence to an appropriate two-player sequence.
  • References and related papers:
    • Jonathan Rubin and Ian Watson. Case-Based Strategies in Computer Poker, AI Communications, Volume 25, Number 1: 19-48, March 2012.