资源论文Non-Stationary Approximate Modified Policy Iteration

Non-Stationary Approximate Modified Policy Iteration

2020-03-04 | |  53 |   40 |   0

Abstract

We consider the infinite-horizon γ-discounted optimal control problem formalized by Markov Decision Processes. Running any instance of Modified Policy Iteration—a family of algorithms that can interpolate between Value and Policy Iteration—with an error  at each iteration is known to lead to stationary policies that are at least 图片.png-optimal. Variations of Value and Policy Iteration, that build 图片.png-periodic nonstationary policies, have recently been shown to display a better 图片.png-optimality guaran tee. We describe a new algorithmic scheme, Non-Stationary Modified Policy Iteration, a family of algorithms parameterized by two integers m 图片.png 0 and 图片.png that generalizes all the above mentionned algorithms. While m allows one to interpolate between Value-Iteration-style and Policy-Iteration-style updates, 图片.png specifies the period of the non-stationary policy that is output. We show that this new family of algorithms also enjoys the improved 图片.png-optimal guarantee. Perhaps more importantly, we show, by exhibiting an original problem instance, that this guarantee is tight for all m and `; this tight ness was to our knowledge only known in two specific cases, Value Iteration (m = 0,图片.png= 1) and Policy Iteration (m = ∞, 图片.png= 1).

上一篇:Differentially Private Bayesian Optimization

下一篇:Classification with Low Rank and Missing Data

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...