DOUBLY ROBUST BIAS REDUCTION INI NFINITE HORIZON OFF -P OLICY ESTIMATION

资源分类

2020-01-02 |

51 |

47 |

Abstract

Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively large variance of typical importance sampling (IS) estimators. Recently, Liu et al. (2018a) proposed an approach that substantially reduces the variance of infinite-horizon off-policy evaluation by estimating the stationary density ratio, but at the cost of introducing potentially high biases due to the error in density ratio estimation. In this paper, we develop a bias-reduced augmentation of their method, which can take advantage of a learned value function to improve accuracy. Our method is doubly robust in that the bias vanishes when either the density ratio or value function estimation is perfect. In general, when either of them is accurate, the bias can also be reduced. Both theoretical and empirical results show that our method yields significant advantages over previous methods.

上一篇：LEARNING TO GROUP :A BOTTOM -U PF RAMEWORKFOR 3D PART DISCOVERY IN UNSEEN CATEGORIES

下一篇：VARI BAD: AV ERY GOOD METHOD FORBAYES -A DAPTIVE DEEP RL VIA META -L EARNING

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to Predi...

Much of model-based reinforcement learning invo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com