Participants: 2013 - Heads-up No-limit Texas Hold'em

Heads-up No-limit Texas Hold'em

Entropy

  • Team Name: Entropy
  • Team Leader:
  • Team Members:
  • Affiliation:
  • Location:
  • Technique:

HITSZ_CS_13

  • Team Name: HITSZ_CS_13
  • Team Leader: Xuan Wang
  • Team Members: Xuan Wang, Jiajia Zhang, Song Wu
  • Affiliation: School of Computer Science and Technology HIT
  • Location: Shenzhen, Guangdon province, China
  • Technique:
    Our program makes decision accoding to current hand strength and a set of precomputed probabilities, at the same time it tries to modeling the opponent. After the opponent model is built, the program will take advantage of the model when making decision.

hugh

  • Team Name: hugh
  • Team Leader: Stan Sulsky
  • Team Members: Stan Sulsky, Ben Sulsky
  • Affiliation: Independent, University of Toronto
  • Location: New York NY, Toronto
  • Technique:
    We attempt to deduce our opponents' strategy from it's actions, and apply expert tactics to exploit that strategy. On later streets this is done by exploring the remaining game tree. On early streets it is based on heuristics.

    This version of hugh is experimental, not expected to do particularly well.

Hyperborean2pn.iro

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Richard Gibson, Joshua Davidson, Michael Johanson, Nolan Bard, Neil Burch, John Hawkin, Trevor Davis, Christopher Archibald, Michael Bowling, Duane Szafron, Rob Holte
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    This agent is a meta-player that switches between 2 different strategies. A default strategy is played until we have seen the opponent make a minimum-sized bet on at least 1% of the hands played so far (a min bet as the first bet of the game is not counted). At this time, we switch to an alternative strategy that both makes min bets itself and better understands min bets.

    Both strategies were computed using Counterfactual Regret Minimization (CFR) [Zinkevich et al., NIPS 2007]. Because 2-player nolimit hold'em is too large a game to apply CFR to directly, we employed abstract games that merges card deals into "buckets" to create a game of manageable size [Gilpin & Sandholm, AAMAS 2007]. In addition, we abstract the raise action to a number of bets relative to the pot size. Our default strategy only makes raises equal 0.5, 0.75, 1, 1.5, 3, 6, 11, 20, or 40 times the pot size, or go all-in, while our alternative strategy makes min raises and raises equal to 0.5, 0.75, 1, 2, 3, 11, or all-in. When the opponent makes an action that our agent cannot, we map the action to one of our raise sizes using probabilistic translation [Schnizlein, Bowling, and Szafron, IJCAI 2009].

    To create our abstract game for the default strategy, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which one of our preliminary 2-player nolimit programs was faced with a decision at that betting sequence in self-play, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 3700, 3700, and 3700 buckets per betting round respectively, while the important part used 169, 180,000, 1,530,000, and 1,680,000 buckets per betting round respectively. Buckets were calculated according to public card textures and k-means clustering over hand strength distriubtions [Johanson et al., AAMAS 2013] and yielded an imperfect recall abstract game by forgetting previous card information and rebucketing on every round [Waugh et al., SARA 2009]. The strategy profile of this abstract game was computed from approximately 498 billion iterations of the "Pure CFR" variant of CFR [Richard Gibson, PhD thesis, in preparation]. This type of strategy is also known as a "dynamic expert strategy" [Gibson & Szafron, NIPS 2011]. The alternative strategy used a simple abstraction with 169, 3700, 3700, and 1175 buckets per round respectively.

Hyperborean2pn.tbr

  • Team Name: Univeristy of Alberta
  • Team Leader: Michael Bowling
  • Team Members: Michael Bowling, Duane Szafron, Rob Holte, Chris Archibald, Michael Johanson, Nolan Bard, John Hawkin, Richard Gibson, Neil Burch, Josh Davidson, Trevor Davis
  • Affiliation: University of Alberta
  • Location: Edmonton, Alberta, Canada
  • Technique:
    Hyperborean is an implicit modelling agent [5] consisting of two abstract strate gies. All strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [1] with imperfect recall abstractions [3]. We also abstract the raise action to a number of bets relative to the pot size. Both strategies makes raises equal to 0.5, 0.75, 1, 1.5, 3, 6, 11, 20, or 40 times the pot size, or go all-in. The portfolio of strategies for the agent consists of:

    1) A Nash equilibrium approximation
    This strategy is the same as the default strategy in our heads-up no-limit IRO entry. To create our abstract game for the strategy, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which one of our preliminary 2-player nolimit programs was faced with a decision at that betting sequence in self-play, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 3700, 3700, and 3700 buckets per betting round respectively, while the important part used 169, 180,000, 1,530,000, and 1,680,000 buckets per betting round respectively. Buckets were calculated according to public card textures and k-means clustering over hand strength distributions [6] and yielded an imperfect recall abstract game by forgetting previous card information and rebucketing on every round [3]. The strategy profile of this abstract game was computed from approximately 498 billion iterations of the "Pure CFR" variant of CFR [Richard Gibson, PhD thesis, in preparation]. This type of strategy is also known as a "dynamic expert strategy" [7].

    2) A data biased response to aggregate data of 2011 and 2012 ACPC competitors
    The exploitive response in the portfolio was created using data biased robust counter strategies [8] to aggregate data from all of the agents in the 2011 and 2012 heads-up no-limit ACPC events. It uses the same betting abstraction as the above Nash equilibrium approximation, but the card abstraction consist of 169, 9000, 9000, and 3700 buckets per betting round uniformly across the game tree.

    A mixture of these agents is dynamically generated using a slightly modified Exp4-like algorithm [4] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [2].
  • References and related papers:
    1. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret minimization in games with incomplete information" In NIPS 2008.
    2. Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. "Strategy Evaluation in Extensive Games with Importance Sampling". In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.
    3. Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.
    4. P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. "Gambling in a rigged casino: The adversarial multi-armed bandit problem". Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.
    5. Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. "Online Implicit Agent Modelling". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.
    6. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. "Evaluating State-Space Abstractions in Extensive-Form Games". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.
    7. Richard Gibson and Duane Szafron. "On Strategy Stitching in Large Extensive Form Multiplayer Games". In Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS), 2011.
    8. Michael Johanson and Michael Bowling. "Data Biased Robust Counter Strategies". In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

KEmpfer

  • Team Name: KEmpfer
  • Team Leader: Eneldo Loza Mencia
  • Team Members: Eneldo Loza Mencia, Tomek Gasiorowski, Peter Glockner, Julian Prommer
  • Affiliation: Knowledge Engineering Group, Technische Universitat Darmstadt
  • Location: Darmstadt, Germany
  • Technique:
    The agent implements a list of expert rules and follows these. Additional opponent statistics are collected and these are used in the rules, but these rules are currently disabled. The backup strategy if no expert rule is found is to play according to the expected hand strength.

Koypetitor

  • Team Name:Koypetitor
  • Team Leader: Adrian Koy
  • Team Members: Adrian Koy, Andrej Kuttruf, assistants
  • Affiliation: Independent
  • Location: London, United Kingdom
  • Technique:

LIACC

  • Team Name:LIACC
  • Team Leader: Luis Filipe Teofilo
  • Team Members: Luis Filipe Teofilo
  • Affiliation: University of Porto, Artificial Intelligence and Computer Science Laboratory
  • Location: Porto, Portugal
  • Technique: Expected value maximization with game partition

Little Rock

  • Team Name: Little Rock
  • Team Leader: Rod Byrnes
  • Team Members: Rod Byrnes
  • Affiliation: Independent
  • Location: Goonellabah, NSW, Australia
  • Technique:
    Little Rock uses an external sampling monte carlo CFR approach with imperfect recall. All agents in this year's competition use the same card abstraction, which has 8192 buckets on each of the flop, turn and river, which are created by clustering all possible hands using a variety of metrics from the current and previous rounds. The 2 player limit agent uses no action abstrations. The other two agents use what I call a "cross-sectional" approach which abstracts aspects of the current game state rather than translating individual actions (which is what I call a "longitudinal" approach).
  • References and related papers:
    1. Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078-1086, 2009.

Neo Poker Bot

  • Team Name: Neo Poker Laboratory
  • Team Leader: Alexander Lee
  • Team Members: Alexander Lee
  • Affiliation: Independent
  • Location:
  • Technique:
    The AI logic employs different combinations of Neural networks, Regret Minimization and Gradient Search Equilibrium Approximation, Decision Trees, Recursive Search methods as well as expert algorithms from professional poker player. Neo analyzes accumulated statistical data which allows the AI to adjust its style of play against opponents.

Nyx

  • Team Name: Nyx
  • Team Leader: Matej Moravcik
  • Team Members: Matej Moravcik, Martin Schmid
  • Affiliation: Charles University
  • Location: Prague, Prague.
  • Technique:
    Implementation of counterfactual regret minimization.

Sartre

  • Team Name: Sartre
  • Team Leader: Kevin Norris
  • Team Members: Kevin Norris, Jonathan Rubin, Ian Watson
  • Affiliation: University of Auckland
  • Location: Auckland, New Zealand
  • Technique:
  • References and related papers:

Slumbot

  • Team Name: Slumbot
  • Team Leader: Eric Jackson
  • Team Members: Eric Jackson
  • Affiliation: Independent
  • Location: Menlo Park, CA, USA
  • Technique:
    Slumbot NL uses a variant of counterfactual regret minimization with public chance sampling.
  • References and related papers:
    1. "Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing" from the upcoming proceedings of the Computer Poker Workshop at AAAI-13.

Tartanian6

  • Team Name: Tartanian6
  • Team Leader: Tuomas Sandholm
  • Team Members: Noam Brown, Sam Ganzfried, Tuomas Sandholm
  • Affiliation: Carnegie Mellon University
  • Location: Pittsburgh, PA, USA
  • Technique:
    Tartanian6 plays an approximate Nash equilibrium strategy that was computed using MCCFR with external sampling on an imperfect recall abstraction. For the river betting round, it computes undominated equilibrium strategies in a finer-grained abstraction in real-time using CPLEX's LP solver.
  • References and related papers:
    1. Sam Ganzfried and Tuomas Sandholm. 2013. Improving Performance in Imperfect-Information Games with Large State and Action Spaces by Solving Endgames. Computer Poker and Imperfect Information Workshop at the National Conference on Artificial Intelligence (AAAI).
    2. Sam Ganzfried and Tuomas Sandholm. 2013. Action Translation in Extensive-Form Games with Large Action Spaces: Axioms, Paradoxes, and the Pseudo-Harmonic Mapping. To appear in Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
    3. Sam Ganzfried, Tuomas Sandholm, and Kevin Waugh. 2012. Strategy Purification and Thresholding: Effective Non-Equilibrium Approaches for Playing Large Games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
    4. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. 2013. Evaluating State-Space Abstractions in Extensive-Form Games. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
    5. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. 2009. Monte Carlo Sampling for Regret Minimization in Extensive Games. In Proceedings of Advances in Neural Information Processing Systems (NIPS).