资源论文Playing hard exploration games by watching YouTube

Playing hard exploration games by watching YouTube

2020-02-17 | |  61 |   43 |   0

Abstract 

Deep reinforcement learning methods traditionally struggle with tasks where environment rewards are particularly sparse. One successful method of guiding exploration in these domains is to imitate trajectories provided by a human demonstrator. However, these demonstrations are typically collected under artificial conditions, i.e. with access to the agent’s exact environment setup and the demonstrator’s action and reward trajectories. Here we propose a two-stage method that overcomes these limitations by relying on noisy, unaligned footage without access to such data. First, we learn to map unaligned videos from multiple sources to a common representation using self-supervised objectives constructed over both time and modality (i.e. vision and sound). Second, we embed a single YouTube video in this representation to construct a reward function that encourages an agent to imitate human gameplay. This method of one-shot imitation allows our agent to convincingly exceed human-level performance on the infamously hard exploration games M ONTEZUMA’ S R EVENGE, P ITFALL ! and P RIVATE E YE for the first time, even if the agent is not presented with any environment rewards.

上一篇:Parameters as interacting particles: long time convergence and asymptotic error scaling of neural networks

下一篇:Statistical Optimality of Stochastic Gradient Descent on Hard Learning Problems through Multiple Passes

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...