资源论文Representation Balancing MDPs for Off-Policy Policy Evaluation

Representation Balancing MDPs for Off-Policy Policy Evaluation

2020-02-14 | |  62 |   55 |   0

Abstract 

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.

上一篇:Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks

下一篇:Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...