资源论文Regret Minimization for Partially Observable Deep Reinforcement Learning

Regret Minimization for Partially Observable Deep Reinforcement Learning

2020-03-20 | |  96 |   48 |   0

Abstract

Deep reinforcement learning algorithms that estimate state and state-action value functions have been shown to be effective in a variety of challenging domains, including learning control strate gies from raw image pixels. However, algorithms that estimate state and state-action value functio typically assume a fully observed state and must compensate for partial observations by using finit length observation histories or recurrent networks In this work, we propose a new deep reinforcement learning algorithm based on counterfactual regret minimization that iteratively updates an ap proximation to an advantage-like function and is robust to partially observed state. We demonstrate that this new algorithm can substantially outperform strong baseline methods on several partially observed reinforcement learning tasks: learning first-person 3D navigation in Doom and Minecraft, and acting in the presence of partially observed objects in Doom and Pong.

上一篇:Continuous-Time Flows for Efficient Inference and Density Estimation

下一篇:L EAPS A ND B OUNDS: A Method for Approximately Optimal Algorithm Configuration

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...