资源论文“How hard is my MDP?” The distribution-norm to the rescue

“How hard is my MDP?” The distribution-norm to the rescue

2020-01-19 | |  53 |   42 |   0

Abstract

In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s, a), one can only transit to states with the same value, learning p(·|s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distribution-norm. The distributionnorm w.r.t. a measure 图片.png is defined on zero v-mean functions f by the standard variation of f with respect to 图片.png. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the problem-free, loose || · ||1 concentration inequalities used in most previous analysis of RL algorithms, with a tighter problem-dependent hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.

上一篇:Sparse PCA with Oracle Property

下一篇:Do Deep Nets Really Need to be Deep?

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...