Participants: 2014 - 3-player Limit Texas Hold'em

3-player Limit Texas Hold'em

Hyperborean (instant run-off)

  • Team Name: Univeristy of Alberta
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Non-dynamic Agent
  • Technique: (NOTE: This agent is the same as the 2013 ACPC's 3-player instant run-off Hyperborean entry.)

    Hyperborean2014-3pl-IRO is a Nash equilibrium approximation trained using
    PureCFR [1, Section 5.5], a recent CFR variant developed by Oskari Tammelin.
    Because 3-player hold'em is too large a game to apply CFR techniques directly,
    we employed an abstract game that merges card deals into "buckets" to create a
    game of manageable size.

    To create our abstract game, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which our 3-player programs from the 2011 and 2012 ACPCs were faced with a decision at that betting sequence, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 180,000, 18,630, and 875 buckets per betting round respectively, while the important part used 169, 1,348,620, 1,530,000, and 2,800,000 buckets per betting round respectively. Buckets were calculated according to public card textures and k-means clustering over hand strength distriubtions [3] and yielded an imperfect recall abstract game by forgetting previous card information and rebucketing on every round [4]. The agent plays the "current strategy profile" computed from approximately 303.6 billion iterations of the PureCFR variant of CFR [1] applied to this abstract game. This type of strategy is also known as a "dynamic expert strategy" [2].


    [1] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.

    [2] Richard Gibson and Duane Szafron.  On Strategy Stitching in Large Extensive Form Multiplayer Games.  In Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS), 2011.

    [3] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

    [4] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall.  In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

Hyperborean (total bankroll)

  • Team Name: Univeristy of Alberta
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Non-dynamic Agent
  • Technique: (NOTE: This agent is the same as the 2013 ACPC's 3-player total bankroll Hyperborean entry.)

    Hyperborean2014-3pl-TBR is a data biased response to aggregate data of ACPC competitors from the 2011 and 2012 3-player limit competitions [2]. The strategy was generated using the Counterfactual Regret Minimization (CFR) algorithm [6].  Asymmetric abstractions were used for the regret minimizing part of each player's strategy, and the frequentist model used by data biased response [1].  Each abstraction uses imperfect recall, forgetting previous card information and rebucketing on every round [5], with the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [3]. The agent's strategy uses an abstraction with 169, 10000, 5450, and 500 buckets on each round of the game, respectively.  The model of prior ACPC competitors groups observations from all competitors into a model using 169, 900, 100, and 25 buckets on each round of the game, respectively.  The agent plays the "current strategy profile" generated after 20 billion iterations of external sampled CFR [4].


    References to relevant papers, if any:

    [1] Nolan Bard, Michael Johanson, Michael Bowling.  Asymmetric Abstractions for Adversarial Settings.  In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

    [2] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

    [3] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating State-Space Abstractions in Extensive-Form Games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

    [4] Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo Sampling for Regret Minimization in Extensive Games. In Proceedings of the Twenty-Third Conference on Neural Information Processing Systems (NIPS), 2009.

    [5] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

    [6] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

Learn2KEmpf (KEmpfer)

  • Team Name: KEmpfer
  • Team Members: Eneldo Loza Mencia, Julian Prommer
  • Affiliation: Knowledge Engineering Group - Technische Universität Darmstadt
  • Location: Darmstadt, Germany
  • Non-dynamic Agent
  • Technique: This paper tries to mimic the behaviour of a given poker agent. Hence, it follows a similar strategy as Sartre from previous years, with two differences: Firstly, in contrast to Sartre, which uses cased based reasoning (basically k-nearest neighbors), we allow to use any learning algorithm. In this particular submission, we used C4.5 to induce a model of a poker agent (more specifically, Weka's implementation J48). Secondly, a much more complete representation of a state is used with up to 50 possible features. We even induce features which are convinient in order to model the opponent modelling used by the agent to be imitated.
    For this year's submission, we learned the behaviour of Hyperborean from the logs of the 2013 three player limit competition. Hence, since Hyperborean uses a CFR strategy, we expect our bot to behave accordingly. However, it is not possible to perfectly replicate the behaviour of a bot (at least with the available data). Hence, we expect our agent to perform worse than a respective opponent using CFR in this year's competition.

    - RUBIN, Jonathan ; WATSON, Ian: Case-based Strategies in Computer Poker. In: AI Communications 25 (2012), Nr. 1, p. 19–48.
    - RUBIN, Jonathan ; WATSON , Ian: Successful Performance via Decision Generalisation in No Limit Texas Hold’em. In: Case-Based Reasoning Research and Development, Vol. 6880. Springer Berlin Heidelberg, 2011, p. 467–481
    - Kischka, Theo: Trainieren eines Computer-Pokerspielers, Bachelor's Thesis, Technische Universität Darmstadt, Knowledge Engineering Group, 2014, http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2014/Kischka_Theo.pdf

Lucifer

  • Team Name: PokerCPT
  • Team Members: Luis Filipe Teofilo
  • Affiliation: University of Porto, Artificial Intelligence and Computer Science Laboratory
  • Location: Porto, Portugal
  • Dynamic Agent
  • Technique: The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

SmooCT

  • Team Name: SmooCT
  • Team Members: Johannes Heinrich
  • Affiliation: University College London
  • Location: London, UK
  • Non-dynamic Agent
  • Technique: SmooCT was trained from self-play Monte-Carlo tree search, using Smooth UCT [2]. The agent uses an imperfect recall abstraction [1] based on an equidistant discretisation of expected hand strength squared values. The abstraction uses 169 and 1000 buckets for the first two betting rounds. For the turn and river the abstraction granularity has been locally refined based on the number of visits to a node in self-play training. The numbers of turn and river buckets lie in [100,400] and [10,160] respectively.


    [1] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    [2] Johannes Heinrich and David Silver. "Self-Play Monte-Carlo Tree Search in Computer Poker". To appear in 2014.