Representation Balancing MDPs for Off-Policy Policy Evaluation

登录免费注册

资源分类

论文
算法
数据集
经验分享
技术动态
行业动态

论文
学习
研究领域

算法
学习
研究领域

数据集
自动驾驶
图片

经验分享
学习
研究领域

技术动态
计算机视觉
自然语言处理

行业动态
教育
语音识别

》资源》论文》Representation Balancing MDPs for Off-Policy Policy Evaluation

Representation Balancing MDPs for Off-Policy Policy Evaluation

2020-02-14 |

62 |

55 |

Representation Balancing MDPs for Off-Policy Policy Evaluation
论文

Abstract

We study the problem of off-policy policy evaluation (OPPE) in RL. In contrast to prior work, we consider how to estimate both the individual policy value and average policy value accurately. We draw inspiration from recent work in causal reasoning, and propose a new finite sample generalization error bound for value estimates from MDP models. Using this upper bound as an objective, we develop a learning algorithm of an MDP model with a balanced representation, and show that our approach can yield substantially lower MSE in common synthetic benchmarks and a HIV treatment simulation domain.

上一篇：Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks

下一篇：Predict Responsibly: Improving Fairness and Accuracy by Learning to Defer

用户评价