Learning Deep Decentralized Policy Network by Collective Rewards for
Real-Time Combat Game
Abstract
The task of real-time combat game is to coordinate
multiple units to defeat their enemies controlled by
the given opponent in a real-time combat scenario.
It is difficult to design a high-level Artificial
Intelligence (AI) program for such a task due to
its extremely large state-action space and real-time
requirements. This paper formulates this task
as a collective decentralized partially observable
Markov decision process, and designs a Deep
Decentralized Policy Network (DDPN) to model
the polices. To train DDPN effectively, a novel
two-stage learning algorithm is proposed which
combines imitation learning from opponent and
reinforcement learning by no-regret dynamics.
Extensive experimental results on various combat
scenarios indicate that proposed method can
defeat different opponent models and significantly
outperforms many state-of-the-art approaches