资源论文EXPLORING MODEL -BASED PLANNING WITH POLICYN ETWORKS

EXPLORING MODEL -BASED PLANNING WITH POLICYN ETWORKS

2020-01-02 | |  87 |   42 |   0

Abstract

Model-based reinforcement learning (MBRL) with model-predictive control or online planning has shown great potential for locomotion control tasks in both sample efficiency and asymptotic performance. Despite the successes, the existing planning methods search from candidate sequences randomly generated in the action space, which is inefficient in complex high-dimensional environments. In this paper, we propose a novel MBRL algorithm, model-based policy planning (POPLIN), that combines policy networks with online planning. More specifically, we formulate action planning at each time-step as an optimization problem using neural networks. We experiment with both optimization w.r.t. the action sequences initialized from the policy network, and also online optimization directly w.r.t. the parameters of the policy network. We show that in the MuJoCo benchmarking environments, POPLIN is about 3x more sample efficient than the previously stateof-the-art algorithms, such as PETS, TD3 and SAC. To explain the effectiveness of our algorithm, we show that the optimization surface in parameter space is smoother than in action space. Further more, we found the distilled policy network can be effectively applied without the expansive model predictive control during test time for some environments such as Cheetah. Code is released1 .

上一篇:A B A SE LI NE FO RF EW- SH OTI MA GE CL AS SI FI CAT IO N

下一篇:ADAPTIVE CORRELATED MONTE CARLO FOR CON -TEXTUAL CATEGORICAL SEQUENCE GENERATION

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...