资源论文BLACK -BOX OFF -POLICY ESTIMATION FORI NFINITE -H ORIZON REINFORCEMENT LEARNING

BLACK -BOX OFF -POLICY ESTIMATION FORI NFINITE -H ORIZON REINFORCEMENT LEARNING

2020-01-02 | |  93 |   64 |   0

Abstract

Off-policy estimation for long-horizon problems is important in many real-life applications such as healthcare and robotics, where high-fidelity simulators may not be available and on-policy evaluation is expensive or impossible. Recently, Liu et al. (2018) proposed an approach that avoids the curse of horizon suffered by typical importance-sampling-based methods. While showing promising results, this approach is limited in practice as it requires data be drawn from the stationary distribution of a known behavior policy. In this work, we propose a novel approach that eliminates such limitations. In particular, we formulate the problem as solving for the fixed point of a certain operator, and develop a new estimator that computes importance ratios of stationary distributions, without knowledge of how the off-policy data are collected. We analyze its asymptotic consistency and finite-sample generalization. Experiments on benchmarks verify the effectiveness of our approach.

上一篇:EXPLORATION IN REINFORCEMENT LEARNINGWITH DEEP COVERING OPTIONS

下一篇:ADVERSARIAL POLICIES :ATTACKINGD EEP REINFORCEMENT LEARNING

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...