DOUBLE NEURAL COUNTERFACTUAL REGRET MINI-MIZATION

资源分类

2020-01-02 |

52 |

42 |

Abstract

DOUBLE NEURAL COUNTERFACTUAL REGRET MINI-MIZATION Anonymous authors Paper under double-blind review ABSTRACT Counterfactual regret minimization (CFR) is a fundamental and effective technique for solving Imperfect Information Games (IIG). However, the original CFR algorithm only works for discrete states and action spaces, and the resulting strategy is maintained as a tabular representation. Such tabular representation limits the method from being directly applied to large games. In this paper, we propose a double neural representation for the IIGs, where one neural network represents the cumulative regret, and the other represents the average strategy. Such neural representations allow us to avoid manual game abstraction and carry out end-to-end optimization. To make the learning efficient, we also developed several novel techniques including a robust sampling method and a mini-batch Monte Carlo Counterfactual Regret Minimization (MCCFR) method, which may be of independent interests. Empirically, on games tractable to tabular approaches, neural strategies trained with our algorithm converge comparably to their tabular counterparts, and significantly outperform those based on deep reinforcement learning. On extremely large games with billions of decision nodes, our approach achieved strong performance while using hundreds of times less memory than the tabular CFR. On head-to-head matches of hands-up no-limit texas hold’em, our neural agent beat the strong agent ABS-CFR 1 by 9.8±4.1 chips per game. It’s a successful application of neural CFR in large games.

上一篇：EXPLORATORY NOT EXPLANATORY:C OUNTERFACTUAL ANALYSIS OF SALIENCY MAPSFOR DEEP RL

下一篇：COMPARING FINE -TUNING AND REWINDING INN EURAL NETWORK PRUNING

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com