Prior-Free Exploration Bonus for and beyond Near Bayes-Optimal Behavior Kenji Kawaguchi Hiroshi Sato

资源分类

2019-11-11 |

69 |

41 |

Abstract We study Bayesian reinforcement learning (RL) as a solution of the exploration-exploitation dilemma. As full Bayesian planning is intractable except for special cases, previous work has proposed several approximation methods. However, these were often computationally expensive or limited to Dirichlet priors. In this paper, we propose a new algorithm that is fast and of polynomial time for near Bayesian optimal policy with any prior distributions that are not greatly misspecified. Perhaps even more interestingly, the proposed algorithm can naturally avoid being misled by incorrect beliefs, while effectively utilizing useful parts of prior information. It can work well even when an utterly misspecified prior is assigned. In that case, the algorithm will follow PAC-MDP behavior instead, if an existing PACMDP algorithm does so. The proposed algorithm naturally outperformed other algorithms compared with it on a standard benchmark problem.

上一篇：Discovering Different Types of Topics: Factored Topic Models Yun Jiang and Ashutosh Saxena

下一篇：Causal Inference with Rare Events in Large-Scale Time-Series Data Samantha Kleinberg

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com