THE GAMBLER ’S PROBLEM AND BEYOND

资源分类

2020-01-02 |

73 |

40 |

Abstract

We analyze the Gambler’s problem, a simple reinforcement learning problem where the gambler has the chance to double or lose their bets until the target is reached. This is an early example introduced in the reinforcement learning textbook by Sutton & Barto (2018), where they mention an interesting pattern of the optimal value function with high-frequency components and repeating nonsmooth points. It is however without further investigation. We provide the exact formula for the optimal value function for both the discrete and the continuous case. Though simple as it might seem, the value function is pathological: fractal, self-similar, zero-derivative almost everywhere, not smooth on any interval, and not written as elementary functions. It is in fact one of the generalized Cantor functions, where it holds a complexity that has been uncharted thus far. Our analyses could lead insights on improving value function approximation, gradientbased algorithms, and Q-learning, in real applications and implementations.

上一篇：GraphSAINT: GRAPH SAMPLING BASED INDUCTIVEL EARNING METHOD

下一篇：PREDICTION POISONING :T OWARDS DEFENSESAGAINST DNN MODEL STEALING ATTACKS

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com