资源论文DISCRIMINATIVE PARTICLE FILTER REINFORCEMENTL EARNING FOR COMPLEX PARTIAL OBSERVATIONS

DISCRIMINATIVE PARTICLE FILTER REINFORCEMENTL EARNING FOR COMPLEX PARTIAL OBSERVATIONS

2020-01-02 | |  74 |   45 |   0

Abstract

Deep reinforcement learning has succeeded in sophisticated games such as Atari, Go, etc. Real-world decision making, however, often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter with learned transition and observation models in a neural network, which enables reasoning with partial observations over multiple time steps. While a standard particle filter relies on a generative observation model, DPFRL learns a discriminatively parameterized model trained specifically for decision making. We show that the discriminative parameterization results in significantly improved performance, especially for tasks with complex visual observations, as it circumvents the difficulty of modeling complex observations that are irrelevant to decision making. Experiments show that DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this work. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.

上一篇:CLASSIFICATION -BASED ANOMALY DETECTION FORG ENERAL DATA

下一篇:ON THE “STEERABILITY ”OFGENERATIVE ADVERSARIAL NETWORKS

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...