- Details
- Category: Participants
- Published on Thursday, 17 January 2013 23:52
- Written by Super User
- Hits: 12986

The 2014 competition had 12 different agents in the heads-up limit Texas hold'em competition, 16 agents in the heads-up no-limit competition, and 6 agents in the 3-player limit competition. As in previous years, agents were submitted by a mixture of universities and individual hobbyists from at least 11 different countries around the world.

Competitors in the 2014 Annual Computer Poker Competition were not required to supply detailed information about their submission(s) in order to compete, but some information about team members, affiliation, location, high level technique descriptions, and occasionally relevant papers were supplied. This page presents that information.

**Team Name:**Cleverpiggy**Team Members:**Allen Cunningham**Affiliation:**Independent**Location:**Marinal del Rey, CA, US**Non-dynamic Agent****Technique:**Cleverpiggy is the progeny of 20 billion iterations of chance sampled CFR run on a quad core intel with 48gigs of ram. She uses a card abstraction with 169, 567528, 60000, 180000 hand types for each respective street. Flop hands are essentially unabstracted. For the turn and river, board types are established by dividing all flops into 20 categories, each of which branches into 3 turns which branch into three rivers, resulting in 60 turn and 180 river distinctions. Hands for each board type are then divided into 1000 buckets based on earth mover and OCHS clustering.

Regret Minimization in Games with Incomplete Information 2007

Evaluating State-Space Abstractions in Extensive-Form Games 2013

**Team Members:**Marv Andersen**Affiliation:**Independent**Location:**London, UK**Non-dynamic Agent****Technique:**This bot is a neural net trained to imitate the play of previous ACPC winners.

**Team Name:**Feste**Team Members:**Francois Pays**Affiliation:**Independent**Location:**Paris, France**Dynamic Agent****Technique:**The card abstraction uses respectively 169, 1500, 400 and 200 buckets for preflop, flop, turn and river. The buckets are computed using k-means clustering over selected hand parameters such as expected values, standard deviation and skewness at last round.

The resulting abstraction is represented using sequence form with the imperfect recall extension and has 1.3 billion game states. It is solved using a custom interior point solver with indirect algebra [1]. The solver runs on a mid-range workstation and is GPU-accelerated with CUDA.

Feste has two strategies at its disposal: a defensive one, close to the equilibrium but still slightly offensive, and a second strategy, definitely aggressive and consequently off the equilibrium. During the course of the game, Feste uses Thompson sampling to select the best adapted strategy for the opponent.

[1] Francois Pays. 2014. An Interior Point Approach to Large Games of Incomplete Information. Proceedings of the AAAI-2014 Workshop on Computer Poker.

**Team Name:**Univeristy of Alberta**Team Members:****Michael Bowling**, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill**Affiliation:**University of Alberta**Location:**Edmonton, Alberta, Canada**Dynamic Agent****Technique:**Hyperborean2014-2pl is an implicit modelling agent [2] consisting of four abstract strategies. All strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [8] with imperfect recall card abstractions. Buckets were calculated according to public card textures and the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [6]. By forgetting previous card information and rebucketing on every round [7], this yields an imperfect recall abstract game.

The portfolio of strategies for the agent consists of:

1) An asymmetric equliibrium strategy

An asymmetric equilibrium strategy was generated to exploit mistakes that can be made by equilibrium based agents using smaller abstractions of the game [3]. The abstraction for the final strategy uses 169, 1,348,620, 1,521,978, and 840,000 buckets on each round, respectively. During training with CFR, the opponent uses a smaller abstraction with 169 buckets on the pre-flop, and 9,000 buckets on each subsequent round.

2) Three data biased robust counter strategies based on prior ACPC competitors

Three strategies in the portfolio are designed to exploit specific players from prior ACPC events. Each response was created using data biased robust counter strategies [5] to data from a particular competitor in prior ACPC events. An asymmetric abstraction is used for the frequentist model used by the data biased response [3], placing observations of the player in a smaller abstraction than the regret minimizing portions of the strategy.

A mixture of these strategies is dynamically generated using a slightly modified Exp4-like algorithm [1] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [4].

[1] P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

[2] Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. Online Implicit Agent Modelling. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

[3] Nolan Bard, Michael Johanson, Michael Bowling. Asymmetric Abstractions for Adversarial Settings. In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

[4] Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. Strategy Evaluation in Extensive Games with Importance Sampling. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.

[5] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

[6] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

[7] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

[8] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

**Team Name:**PokerCPT**Team Members:**Luis Filipe Teofilo**Affiliation:**University of Porto, Artificial Intelligence and Computer Science Laboratory**Location:**Porto, Portugal**Dynamic Agent****Technique:**The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

**Team Name:**PokerStar**Team Members:**Ingo Kaps**Location:**Frankfurt, Hessen, Germany**Dynamic Agent****Technique:**The PokerStar Bot is written in Pascal.

The PokerStar Bot plays at Preflop: 2.5% fold, 95% call, 2,5% raise,

so other bots could not easy modelling.

After Preflop PokerStar Bot calculates a opponent based squered weighted hand strength

to use an optimized static bucket CFR Table.

If the opponent RaiseRatio is to low a rule based Strategy will be used.

If the opponent RaiseRatio is very low additional a Selby Preflop Strategy will be used.

If the opponent RaiseRatio is to high Pokerstar Bot will ever call when the opponent is

raising, or will raise if the opponent checks.

**Team Name:**ProPokerTools**Team Members:**Dan Hutchings**Affiliation:**ProPokerTools**Location:**Lakewood, Colorado, US**Non-dynamic Agent****Technique:**This hulhe agent was created using established methods; regret minimization,

partial recall, etc. etc.

Last year, I gave myself a constraint in building my AI agents; all agents were

created on a single machine that costs less than $1,000. This year, I have loosened that constraint to allow $1,000 worth of rented compute time in the 'cloud'.

This year's entry has been improved in three different areas; size (9 times larger), build time (9 times longer), and abstraction quality. Tests using the 2013 benchmark server show this agent would likely have placed third or fourth in the instant run-off competition if it were entered last year. Additional improvements have been developed but were not ready in time for this year's competition.

**Team Name:**Slugathorus**Team Members:**Daniel Berger**Affiliation:**University of South Wales**Dynamic Agent****Technique:**The agent combines a precomputed approximate Nash equilibrum strategy generated by public chance sampled MCCFR with a new modelling technique designed to identify when the opponent is making mistakes and exploit them.

"Modelling Player Weakness in Poker". (Berger, 2013)

"Efficient Nash Equilibrium Approximation through Monte Carlo Counterfactual Regret Minimization". (Johanson, 2012)

"Regret Minimization in Games With Incomplete Information". (Zinkevich, 2007)

**Team Name:**SmooCT**Team Members:**Johannes Heinrich**Affiliation:**University College London**Non-dynamic Agent****Technique:**SmooCT was trained from self-play Monte-Carlo tree search, using Smooth UCT [2]. The resulting strategy aims to approximate a Nash equilibrium. The agent uses an imperfect recall abstraction [1] based on an equidistant discretisation of expected hand strength squared values. The abstraction uses 169, 2000, 1000 and 600 buckets for the four betting rounds respectively.

[1] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

[2] Johannes Heinrich and David Silver. "Self-Play Monte-Carlo Tree Search in Computer Poker". To appear in 2014.

**Team Members:**Jon Parker**Affiliation:**Georgetown University and John Hopkins University**Non-dynamic Agent****Technique:**The solution was built using sparse dynamic programming. The "dynamic programming" portion of the solution is due to reusing precomputed "PayoffMatrix" objects and "StrategyBlock" objects. Saving the PayoffMatrices allows the bot to accurately estimate upstream payoffs with relatively little computation. Saving the StrategyBlock objects allows "new" Blocks to be initialized with solutions that already reflect some CFR style iterations. No board abstraction is used because the "best" StrategyBlock to use for a betting round is found by looking up the nearest StrategyBlock given the relevant BettingSequence and BoardSequence

My bot folds very few hands preflop (when compared against winners from prior HULHE competitions) and it also open calls a decent fraction of the time. I am somewhat concerned that the open calling behavior is indicative of an error in the bot somewhere. However, I have searched very hard for an error and haven't found one. Moreover, the overall "game value" I computed is almost identical to the "game value" Eric Jackson (winner from 2012) found (I asked him in an email about this).

**Team Name:**Feste**Team Members:**Francois Pays**Affiliation:**Independent**Location:**Paris, France**Non-dynamic Agent****Technique:**The card abstraction uses respectively 169, 400, 50 and 25 buckets for preflop, flop, turn and river. The buckets are computed using k-means clustering over selected hand parameters such as expected values, standard deviation and skewness at last round. The betting abstraction is quite coarse: half pot (only at first bet), pot, quad-pot and all-in.

The abstraction is represented using sequence form with the imperfect recall extension. The resulting abstraction has only 300 million game states. It is solved using a custom interior point solver with indirect algebra [1]. The solver runs on a mid-range workstation and is GPU-accelerated with CUDA. The solver has been tested up to 3 billion game states and can therefore handle abstractions ten times larger, but interestingly, either finer card or betting abstractions did not result in stronger no-limit players.

Since Feste is not yet able to gather accurate enough information from its opponent in 3000-hand games, there is no dynamic adaptation. The instant runoff player follows a defensive strategy and the total-bankroll player, a slightly more aggressive one.

[1] Francois Pays. 2014. An Interior Point Approach to Large Games of Incomplete Information. Proceedings of the AAAI-2014 Workshop on Computer Poker.

**Team Name:**Cleverpiggy**Team Members:**Allen Cunningham**Affiliation:**Independent**Location:**Marinal del Rey, CA, US**Non-dynamic Agent****Technique:**Hibiscus Biscuit is comprised of separately trained big blind and button strategies with knowledge of different bet sizes. In each case the hero uses only a couple sizes but defends against many. Both sides use a card abstraction of 169, 20000, 18000 and 17577 buckets. These consist of board card distinctions, earth mover clustering for the flop and turn, and clustering over (wins, ties) for the river.

Regret Minimization in Games with Incomplete Information 2007

Evaluating State-Space Abstractions in Extensive-Form Games 2013

**Team Name:**Univeristy of Alberta**Team Members:****Michael Bowling**, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill**Affiliation:**University of Alberta**Location:**Edmonton, Alberta, Canada**Non-dynamic Agent****Technique:**Hyperborean2014-NL-IRO is a Nash equilibrium approximation trained using PureCFR [1, Section 5.5],

a recent CFR variant developed by Oskari Tammelin. Its card abstraction is symmetric

and uses imperfect recall, with 169 (perfect) preflop buckets, 18630 flop buckets,

3700 turn buckets and 1175 river buckets, using the k-means Earthmover and k-means OCHS

buckets recently presented by Johanson et al [2]. The betting abstraction is asymmetric,

and has different bet sizes and limits for the opponent and the agent. The opponent is anticipated

to have a large number of bets, including min-bets or tenth-pot bets, with higher limits

on how many bets of each type can be made in each round than the agent. The agent has a similar set of bets,

including 0.1-pot and 0.25-pot but not including min-bets, with bet sizes 1.5-pot and less being

restricted to its first two actions. This asymmetric betting abstraction gives the agent the ability to

interpret many actions of the opponent in order to limit the impact of translation errors, while still having

a few unusual bet sizes (0.1, 0.25, 0.65) that may cause translation errors in the opponents.

Since this agent is asymmetric, computing its strategy required solving two abstract games.

The game with the opponent in seat 1 had 9,765,306,248 information sets and 26,879,972,986 infoset-actions,

and the game with the opponent in seat 2 had 10,986,105,934 information sets and 30,285,810,764 infoset-actions.

While the average strategy is the component of CFR that converges to an equilibrium, for

this set of strategies we only ran 80 billion and 84 billion iterations of PureCFR respectively, and

for this size of game we anticipated improvement up to 300 billion or more iterations. Instead of

the average strategy, this agent uses the current strategy which does not converge to equilibrium,

but has been demonstrated by Gibson to improve much more quickly in in-game performance [1, Section 4.4.3].

[1] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.

[2] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

**Team Name:**Univeristy of Alberta**Team Members:****Michael Bowling**, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill**Affiliation:**University of Alberta**Location:**Edmonton, Alberta, Canada**Dynamic Agent****Technique:**Hyperborean2014-2pn-TBR is an implicit modelling agent [2] consisting of three abstract strategies. All strategies were generated using the Counterfactual Regret Minimization (CFR) algorithm [10] with imperfect recall abstractions [9]. We also abstract the raise action to a number of bets relative to the pot size. Both strategies makes raises equal to 0.5, 0.75, 1, 1.5, 3, 6, 11, 20, or 40 times the pot size, or go all-in. The portfolio of strategies for the agent consists of:

1) A Nash equilibrium approximation

To create our abstract game for the strategy, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which one of our preliminary 2-player nolimit programs was faced with a decision at that betting sequence in self-play, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 3700, 3700, and 3700 buckets per betting round respectively, while the important part used 169, 180,000, 1,530,000, and 1,680,000 buckets per betting round respectively. Buckets were calculated according to public card textures and the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [8]. By forgetting previous card information and rebucketing on every round [9], this yields an imperfect recall abstract game. The strategy profile of this abstract game was computed from approximately 498 billion iterations of PureCFR [5, Section 5.5], a recent CFR variant developed by Oskari Tammelin. This type of strategy is also known as a "dynamic expert strategy" [6].

2) A data biased response to aggregate data of 2011 and 2012 ACPC competitors

One exploitive response in the portfolio was created using data biased robust counter strategies [7] to aggregate data from all of the agents in the 2011 and 2012 heads-up no-limit ACPC events. It uses the same betting abstraction as the above Nash equilibrium approximation, but the card abstraction consists of 169, 9000, 9000, and 3700 k-means Earthmover and k-means OCHS buckets per betting round uniformly across the game tree. An asymmetric abstraction is used for the frequentist model used by the data biased response [3]. The model's abstraction ignores card information and only models agents on their abstract betting.

3) A data biased response to aggregate data of some 2013 ACPC competitors

The second exploitive response uses the same abstract game as the previous DBR, but only uses data from agents that weren't beaten by the 2013 Hyperborean TBR entry for at least 750 mbb/g.

A mixture of these agents is dynamically generated using a slightly modified Exp4-like algorithm [1] where the reward vector for the experts/strategies is computed using importance sampling over the individual strategies [4].

[1] P Auer, N Cesa-Bianchi, Y Freund, and R.E Schapire. Gambling in a rigged casino: The adversarial multi-armed bandit problem. Proceedings of the 36th Annual Symposium on Foundations of Computer Science, 1995.

[2] Nolan Bard, Michael Johanson, Neil Burch, Michael Bowling. Online Implicit Agent Modelling. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

[3] Nolan Bard, Michael Johanson, Michael Bowling. Asymmetric Abstractions for Adversarial Settings. In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

[4] Michael Bowling, Michael Johanson, Neil Burch, and Duane Szafron. Strategy Evaluation in Extensive Games with Importance Sampling. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML), 2008.

[5] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.

[6] Richard Gibson and Duane Szafron. On Strategy Stitching in Large Extensive Form Multiplayer Games. In Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS), 2011.

[7] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

[8] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

[9] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

[10] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

**Team Name:**KEmpfer**Team Members:****Eneldo Loza Mencia**, Julian Prommer**Affiliation:**Knowledge Engineering Group - Technische Universität Darmstadt**Location:**Darmstadt, Germany**Non-dynamic Agent****Technique:**The agent implements a list of expert rules and follows these. Additional opponent statistics can be collected and be used in the rules, but we currently do not make use of this option. The backup strategy if no expert rule is found is to play according to the all-in equity and pot-odds.

**Team Name:**Little Rock**Team Members:**Rod Byrnes**Affiliation:**Independent**Location:**Goonellabah, NSW, Australia**Non-dynamic Agent****Technique:**External sampling MCCFR approach, virtually the same as last year but with a few enhancements to the card abstraction and action abstraction techniques.

Monte Carlo Sampling for Regret Minimization in Extensive Games. Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078-1086, 2009.

**Team Name:**PokerCPT**Team Members:**Luis Filipe Teofilo**Affiliation:**University of Porto, Artificial Intelligence and Computer Science Laboratory**Location:**Porto, Portugal**Dynamic Agent****Technique:**The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

**Team Name:**Nyx**Team Members:**Martin Schmid**Affiliation:**Charles University**Location:**Prague, Czech Republicx**Non-dynamic Agent****Technique:**Improved version of the previous Nyx. Better action abstraction and new automatic public card abstraction

**Team Name:**PijaiBot**Team Members:**Ryan Pijai**Affiliation:**Independent**Location:**Orlando, FL, USA**Non-dynamic Agent****Technique:**PijaiBot is an Imperfect Recall, Approximate Nash Equilibrium agent with novel approaches to card-clustering, bet-translating, and opponent-trapping. All non-isomorphically-similar card situations are grouped together using K-Means Clustering with Bhattacharyya Distance of Expected Hand-Strengths as the distance measure rather than more traditional distance measures. When opponent bet sizes do not match any of the sizes in PijaiBot's betting abstraction, PijaiBot interprets bet sizes using Soft Translation of Geometric Similarity based on pot-odds, rather than on pot-relative or stack-relative bet sizes as has been done in the past.

PijaiBot has special-case logic for translating small bets that do not match any of its abstraction bet sizes into checks and calls, and is able to patch up its internal abstract betting history as needed to make those translations valid action sequences. PijaiBot attempts to exploit other agents that do not handle this and other types of similarly tricky situations properly by occasionally overriding PijaiBot's own Nash Equilibrium strategy suggestions with potentially more damaging actions that test and confuse its opponents by inducing them into misrepresenting the true state of the game.

**Team Name:**Prelude**Team Members:**Tim Reiff**Affiliation:**Unfold Poker**Location:**Las Vegas, NV, USA**Non-dynamic Agent****Technique:**Prelude is an equilibrium strategy that implements several published techniques, including the training algorithm Pure CFR, the opponent bet translation method, and a card abstraction based on k-means clustering over hand strength distributions. I had hoped to test an agent with some ambitious importance sampling, but that one is converging slowly... Thus, I whipped up a more conservative entry, with a streamlined betting tree partly designed for faster training. I snuck in a few minor experiments still, including modified EV histograms and OCHS categories for the card abstraction, selective use of purification thresholding, and some speculative bet sizing adjustments.

1. Sam Ganzfried and Tuomas Sandholm. "Tartanian5: A Heads-Up No-Limit Texas Hold'em Poker-Playing Program". In Computer Poker Symposium at the National Conference on Artificial Intelligence (AAAI), 2012.

2. Richard Gibson. "Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents". PhD thesis, University of Alberta, 2014.

3. Greg Hamerly. "Making k-means even faster". In proceedings of the 2010 SIAM international conference on data mining (SDM 2010), April 2010.

4. Eric Jackson. "Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing". In Computer Poker Workshop on Artificial Intelligence (AAAI), 2013.

5. Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. "Evaluating State-Space Abstractions in Extensive-Form Games". In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

6. Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. "Regret Minimization in Games with Incomplete Information". In Proceedings of Advances in Neural Information Processing Systems (NIPS), 2007.

**Team Name:**Sartre**Team Members:****Kevin Norris**, Jonathan Rubin, Ian Watson**Affiliation:**The University of Auckland**Location:**Auckland, New Zealand**Dynamic Agent****Technique:**SartreNLExp combines an approximate Nash equilibrium strategy with exploitation capabilities. It plays a base approximate Nash equilibrium strategy that was created by imitating the play [1] of the 2013 Slumbot agent [2]. SartreNLExp also incorporates a statistical exploitation module that models the opponent online, and identifies exploitable statistical anomalies in the opponent play. When a game state arises where the statistical exploitation module is able to exploit one of the opponents statistical anomalies it overrides the base strategy and provides an exploitive action. Together the base strategy and statistical exploitation module provide safe opponent exploitation, given that the opponent model is an accurate reflection of the opponent’s action frequencies. The agent has been improved from its previous iteration, presented in [3]. The exploitation capabilities of the statistical exploitation module have been greatly improved and the opponent model has been entirely overhauled. Additionally a novel decaying history method, statistic specific decaying history, has been implemented to ensure the opponent model is able to accurately reflect the frequency statistics of both static and dynamic opponents.

References to relevant papers, if any

1. Rubin, J., & Watson, I. (2011). Successful performance via decision generalisation in no limit Texas Hold’em. In Case-Based Reasoning Research and Development (pp. 467-481). Springer Berlin Heidelberg.

2. Jackson, E. (2013, June). Slumbot NL: Solving Large Games with Counterfactual Regret Minimization Using Sampling and Distributed Processing. In Workshops at the Twenty-Seventh AAAI Conference on Artificial Intelligence.

3. Norris, K., & Watson, I. (2013, August). A statistical exploitation module for Texas Hold'em: And it's benefits when used with an approximate nash equilibrium strategy. In Computational Intelligence in Games (CIG), 2013 IEEE Conference on (pp. 1-8). IEEE.

**Team Name:**Slumbot**Team Members:**Eric Jackson**Affiliation:**Independent**Non-dynamic Agent****Technique:**I use Pure External CFR to compute an approximate equilibrium. A decomposition technique described in my forthcoming workshop paper allows me to break the

game tree into pieces that can be solved independently. I employ an abstraction that uses more granularity (both more bet sizes and more buckets) at more commonly reached game states.

"A Time and Space Efficient Algorithm for Approximately Solving Large Imperfect Information Games"; Eric Jackson; 2014; forthcoming in the Proceedings of the Workshop on Computer Poker and Imperfect Information at AAAI-14.

**Team Name:**Tartanian**Team Members:**Noam Brown, Sam Ganzfried,**Tuomas Sandholm****Affiliation:**Carnegie Mellon University**Location:**Pittsburgh, PA, USA**Non-dynamic Agent****Technique:**One/Two Paragraph Summary of Technique:

Tartanian7 plays an approximate Nash equilibrium strategy that was computed on Pittsburgh's shared-memory supercomputer, which has a cache coherent Non-Uniform Memory Access (ccNUMA) architecture. We developed a new abstraction algorithm and a new equilibrium-finding algorithm that enabled us to perform a massive equilibrium computation on this architecture.

The abstraction algorithm first clusters public flop boards, assigning each cluster to a blade on the supercomputer. These public clusters are computed by clustering using a distance function based on how often our abstraction from last year grouped hands together on the flop with different sets of public cards. Within each cluster, the algorithm then buckets the flop, turn, and river hands that are possible given one of the public flops in the cluster, using an imperfect-recall abstraction algorithm. We did not perform any abstraction for the preflop round.

Our equilibrium-finding algorithm is a modified version of external-sampling MCCFR. It samples one pair of preflop hands per iteration. For the postflop, each blade samples community cards from its public cluster and performs MCCFR in parallel. The samples are weighted to remove bias.

Our agent also uses a novel reverse mapping technique that compensates for the failure of CFR to fully converge and to possibly overfit the strategies to the abstraction.

**Team Name:**Univeristy of Alberta**Team Members:****Michael Bowling**, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill**Affiliation:**University of Alberta**Location:**Edmonton, Alberta, Canada**Non-dynamic Agent****Technique:**(NOTE: This agent is the same as the 2013 ACPC's 3-player instant run-off Hyperborean entry.)

Hyperborean2014-3pl-IRO is a Nash equilibrium approximation trained using

PureCFR [1, Section 5.5], a recent CFR variant developed by Oskari Tammelin.

Because 3-player hold'em is too large a game to apply CFR techniques directly,

we employed an abstract game that merges card deals into "buckets" to create a

game of manageable size.

To create our abstract game, we first partitioned the betting sequences into two parts: an "important" part, and an "unimportant" part. Importance was determined according to the frequency with which our 3-player programs from the 2011 and 2012 ACPCs were faced with a decision at that betting sequence, as well as according to the number of chips in the pot. Then, we employed two different granularities of abstraction, one for each part of this partition. The unimportant part used 169, 180,000, 18,630, and 875 buckets per betting round respectively, while the important part used 169, 1,348,620, 1,530,000, and 2,800,000 buckets per betting round respectively. Buckets were calculated according to public card textures and k-means clustering over hand strength distriubtions [3] and yielded an imperfect recall abstract game by forgetting previous card information and rebucketing on every round [4]. The agent plays the "current strategy profile" computed from approximately 303.6 billion iterations of the PureCFR variant of CFR [1] applied to this abstract game. This type of strategy is also known as a "dynamic expert strategy" [2].

[1] Richard Gibson. Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis. University of Alberta, 2013.

[2] Richard Gibson and Duane Szafron. On Strategy Stitching in Large Extensive Form Multiplayer Games. In Proceedings of the Twenty-Fifth Conference on Neural Information Processing Systems (NIPS), 2011.

[3] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating state-space abstractions in extensive-form games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 271–278, 2013.

[4] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

**Team Name:**Univeristy of Alberta**Team Members:****Michael Bowling**, Duane Szafron, Rob Holte, Nolan Bard, Neil Burch, Richard Gibson, John Hawkin, Michael Johanson, Trevor Davis, Josh Davidson, Dustin Morrill**Affiliation:**University of Alberta**Location:**Edmonton, Alberta, Canada**Non-dynamic Agent****Technique:**(NOTE: This agent is the same as the 2013 ACPC's 3-player total bankroll Hyperborean entry.)

Hyperborean2014-3pl-TBR is a data biased response to aggregate data of ACPC competitors from the 2011 and 2012 3-player limit competitions [2]. The strategy was generated using the Counterfactual Regret Minimization (CFR) algorithm [6]. Asymmetric abstractions were used for the regret minimizing part of each player's strategy, and the frequentist model used by data biased response [1]. Each abstraction uses imperfect recall, forgetting previous card information and rebucketing on every round [5], with the k-means Earthmover and k-means OCHS buckets recently presented by Johanson et al [3]. The agent's strategy uses an abstraction with 169, 10000, 5450, and 500 buckets on each round of the game, respectively. The model of prior ACPC competitors groups observations from all competitors into a model using 169, 900, 100, and 25 buckets on each round of the game, respectively. The agent plays the "current strategy profile" generated after 20 billion iterations of external sampled CFR [4].

References to relevant papers, if any:

[1] Nolan Bard, Michael Johanson, Michael Bowling. Asymmetric Abstractions for Adversarial Settings. In Proceedings of the Thirteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), 2014.

[2] Michael Johanson and Michael Bowling. Data Biased Robust Counter Strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.

[3] Michael Johanson, Neil Burch, Richard Valenzano, and Michael Bowling. Evaluating State-Space Abstractions in Extensive-Form Games. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2013.

[4] Marc Lanctot, Kevin Waugh, Martin Zinkevich, and Michael Bowling. Monte Carlo Sampling for Regret Minimization in Extensive Games. In Proceedings of the Twenty-Third Conference on Neural Information Processing Systems (NIPS), 2009.

[5] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. A Practical Use of Imperfect Recall. In Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

[6] Martin Zinkevich, Michael Johanson, Michael Bowling, and Carmelo Piccione. Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 (NIPS), 2007.

**Team Name:**KEmpfer**Team Members:****Eneldo Loza Mencia**, Julian Prommer**Affiliation:**Knowledge Engineering Group - Technische Universität Darmstadt**Location:**Darmstadt, Germany**Non-dynamic Agent****Technique:**This paper tries to mimic the behaviour of a given poker agent. Hence, it follows a similar strategy as Sartre from previous years, with two differences: Firstly, in contrast to Sartre, which uses cased based reasoning (basically k-nearest neighbors), we allow to use any learning algorithm. In this particular submission, we used C4.5 to induce a model of a poker agent (more specifically, Weka's implementation J48). Secondly, a much more complete representation of a state is used with up to 50 possible features. We even induce features which are convinient in order to model the opponent modelling used by the agent to be imitated.

For this year's submission, we learned the behaviour of Hyperborean from the logs of the 2013 three player limit competition. Hence, since Hyperborean uses a CFR strategy, we expect our bot to behave accordingly. However, it is not possible to perfectly replicate the behaviour of a bot (at least with the available data). Hence, we expect our agent to perform worse than a respective opponent using CFR in this year's competition.

- RUBIN, Jonathan ; WATSON, Ian: Case-based Strategies in Computer Poker. In: AI Communications 25 (2012), Nr. 1, p. 19–48.

- RUBIN, Jonathan ; WATSON , Ian: Successful Performance via Decision Generalisation in No Limit Texas Hold’em. In: Case-Based Reasoning Research and Development, Vol. 6880. Springer Berlin Heidelberg, 2011, p. 467–481

- Kischka, Theo: Trainieren eines Computer-Pokerspielers, Bachelor's Thesis, Technische Universität Darmstadt, Knowledge Engineering Group, 2014, http://www.ke.tu-darmstadt.de/lehre/arbeiten/bachelor/2014/Kischka_Theo.pdf

**Team Name:**PokerCPT**Team Members:**Luis Filipe Teofilo**Affiliation:**University of Porto, Artificial Intelligence and Computer Science Laboratory**Location:**Porto, Portugal**Dynamic Agent****Technique:**The base agent's strategies are a nash-equilibrium (ne). Several ne strategies were computed and the agent switches between ne to difficult opponent modeling (especially on Kuhn3P). To compute the ne strategy, an implementation of cfr was used. This implementation greatly reduces the game tree by removing decisions at chance nodes where the agent knows that it has a very high or very low probability of winning. For multiplayer Poker, the cfr implementation abstracts game sequences. The methodology to group card buckets was based on grouping buckets by their utility on smaller games. As for no-limit, the actions were also abstracted into 4 single possible decisions.

**Team Name:**SmooCT**Team Members:**Johannes Heinrich**Affiliation:**University College London**Location:**London, UK**Non-dynamic Agent****Technique:**SmooCT was trained from self-play Monte-Carlo tree search, using Smooth UCT [2]. The agent uses an imperfect recall abstraction [1] based on an equidistant discretisation of expected hand strength squared values. The abstraction uses 169 and 1000 buckets for the first two betting rounds. For the turn and river the abstraction granularity has been locally refined based on the number of visits to a node in self-play training. The numbers of turn and river buckets lie in [100,400] and [10,160] respectively.

[1] Kevin Waugh, Martin Zinkevich, Michael Johanson, Morgan Kan, David Schnizlein, and Michael Bowling. "A Practical Use of Imperfect Recall". Proceedings of the Eighth Symposium on Abstraction, Reformulation and Approximation (SARA), 2009.

[2] Johannes Heinrich and David Silver. "Self-Play Monte-Carlo Tree Search in Computer Poker". To appear in 2014.

IN PROGRESS, DONE SOON