资源论文DYNAMICS -AWARE EMBEDDINGS

DYNAMICS -AWARE EMBEDDINGS

2020-01-02 | |  76 |   29 |   0

Abstract

In this paper we consider self-supervised representation learning to improve sample efficiency in reinforcement learning (RL). We propose a forward prediction objective for simultaneously learning embeddings of states and action sequences. These embeddings capture the structure of the environment’s dynamics, enabling efficient policy learning. We demonstrate that our action embeddings alone improve the sample efficiency and peak performance of model-free RL on control from low-dimensional states. By combining state and action embeddings, we achieve efficient learning of high-quality policies on goal-conditioned continuous control from pixel observations in only 1-2 million environment steps.

上一篇:THE IMPLICIT BIAS OF DEPTH :H OW INCREMENTALL EARNING DRIVES GENERALIZATION

下一篇:MEMO: AD EEP NETWORK FOR FLEXIBLE COMBINA -TION OF EPISODIC MEMORIES

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...