The 2016 competition had 9 different agents in the heads-up no-limit Texas hold'em competition. As in previous years, agents were submitted by a mixture of universities and individual hobbyists from 5 different countries around the world.
Competitors in the 2016 Annual Computer Poker Competition were not required to supply detailed information about their submission(s) in order to compete, but some information about team members, affiliation, location, high level technique descriptions, and occasionally relevant papers were supplied. This page presents that information.
Act1 was trained by an experimental distributed implementation of the Pure CFR algorithm. A heuristic was added to occasionally avoid some game tree paths, reducing the time spent per training iteration. To compensate for imperfect recall, a distance metric that considers features from all postflop streets was used to construct the card abstraction on the river. Several bet sizes were omitted because they offer little benefit against other equilibrium opponents while requiring a disproportionate amount of resources to train and store.
The strategy consists of 159 billion information sets (430 billion information set-action pairs) and completed 5.15 trillion iterations.
This bot implements a CFR strategy. For training the policy, we used the Open Pure CFR implementation and adapted it to no-limit heads-up. In addition, we implemented some more advanced techniques such as cards and bucket clustering.
Automatic public card abstraction for the flop round - Schmid, M., Moravcik, M., Hladik, M., & Gaukroder, S. J. (2015, January). Automatic Public State Space Abstraction in Imperfect Information Games. In Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence.
We have attempted to create a generic model by mining the logs of matches from
previous matches, especially well performing bots. In future work we plan on
implementing in-game modelling of the opponents.
History games are used to calculate equity of the current hand and the current board cards. Bot is making random decisions with probability of actions from the history games. Because the pure random is not very effective there are also couple of hardcoded rules the but must consider before making a single action.
Slumbot is a large Counterfactual Regret Minimization (CFR) implementation. It uses the external sampling variant of MCCFR (Monte Carlo CFR) and employs a symmetric abstraction. Some statistics about the size of the abstraction:
We used a distributed implementation of CFR running on eleven r3.4xlarge Amazon EC2 instances.
More details can be found in my paper to be presented at the 2016 Computer Poker Workshop at AAAI.
BabyTartanian8 plays an approximate Nash equilibrium that was computed on the San Diego Comet supercomputer. For equilibrium finding, we used a new Monte Carlo CFR variant that leverages the recently-introduced regret-based pruning (RBP) method [Brown & Sandholm NIPS-15] to sample actions with negative regret less frequently, which dramatically speeds up convergence. Our agent uses an asymmetric action abstraction. This required conducting two separate equilibrium-finding runs.
Noam Brown and Tuomas Sandholm. Regret-Based Pruning in Extensive-Form Games. In Neural Information Processing Systems (NIPS), 2015.
Noam Brown, Sam Ganzfried, and Tuomas Sandholm. Hierarchical Abstraction, Distributed Equilibrium Computation, and Post-Processing, with Application to a Champion No-Limit Texas Hold'em Agent. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2015.