资源论文IMPLEMENTATION MATTERS IN DEEP POLICYG RADIENTS :A CASE STUDY ON PPO AND TRPO

IMPLEMENTATION MATTERS IN DEEP POLICYG RADIENTS :A CASE STUDY ON PPO AND TRPO

2020-01-02 | |  178 |   60 |   0

Abstract

We study the roots of algorithmic progress in deep policy gradient algorithms through a case study on two popular algorithms, Proximal Policy Optimization and Trust Region Policy Optimization. We investigate the consequences of “codelevel optimizations:” algorithm augmentations found only in implementations or described as auxiliary details to the core algorithm. Seemingly of secondary importance, such optimizations have a major impact on agent behavior. Our results show that they (a) are responsible for most of PPO’s gain in cumulative reward over TRPO, and (b) fundamentally change how RL methods function. These insights show the difficulty, and importance, of attributing performance gains in deep reinforcement learning.

上一篇:MULTI -S CALE REPRESENTATION LEARNING FOR SPA -TIAL FEATURE DISTRIBUTIONS USING GRID CELLS

下一篇:LEARNING TO MOVE WITH AFFORDANCE MAPS

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...