Abstract
Deep reinforcement learning has succeeded in sophisticated games such as Atari, Go, etc. Real-world decision making, however, often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter with learned transition and observation models in a neural network, which enables reasoning with partial observations over multiple time steps. While a standard particle filter relies on a generative observation model, DPFRL learns a discriminatively parameterized model trained specifically for decision making. We show that the discriminative parameterization results in significantly improved performance, especially for tasks with complex visual observations, as it circumvents the difficulty of modeling complex observations that are irrelevant to decision making. Experiments show that DPFRL outperforms state-of-the-art POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this work. Further, DPFRL performs well for visual navigation with real-world data in the Habitat environment.