资源论文Recurrent Predictive State Policy Networks

Recurrent Predictive State Policy Networks

2020-03-16 | |  63 |   42 |   0

Abstract

We introduce Recurrent Predictive State Policy (RPSP) networks, a recurrent architecture that brings insights from predictive state representations to reinforcement learning in partially observable environments. Predictive state policy networks consist of a recursive filter, which keeps track of a belief about the state of the environmen and a reactive policy that directly maps beliefs to actions. The recursive filter leverages predictive state representations (PSRs) (Rosencrantz & Gordon, 2004; Sun et al., 2016) by modeling predictive state — a prediction of the distribution of future observations conditioned on history and future actions. This representation gives rise to a rich class of statistically consistent algorithms (Hefny et al., 2018) to initialize the recursive fi ter. Predictive state serves as an equivalent repre sentation of a belief state. Therefore, the policy component of the RPSP-network can be purely reactive, simplifying training while still allowing optimal behaviour. We optimize our policy using a combination of policy gradient based on rewards (Williams, 1992) and gradient descent based on prediction error of the recursive filter. We show the efficacy of RPSP-networks under partial observability on a set of robotic control tasks from OpenAI Gym. We empirically show that RPSP-networks perform well compared with memory-preserving networks such as GRUs, as well as finite memory models, being the overall best performing method.

上一篇:Noisin: Unbiased Regularization for Recurrent Neural Networks

下一篇:Solving Partial Assignment Problems using Random Clique Complexes

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...