资源论文Approximate Policy Iteration Schemes: A Comparison

Approximate Policy Iteration Schemes: A Comparison

2020-03-04 | |  70 |   41 |   0

Abstract

We consider the infinite-horizon discounted optimal control problem formalized by Markov Decision Processes. We focus on several approximate variations of the Policy Iteration algorithm: Approximate Policy Iteration (API) (Bertsekas & Tsitsiklis, 1996), Conservative Policy Iteration (CPI) (Kakade & Langford, 2002), a natural adaptation of the Policy Search by Dynamic Programming algorithm (Bagnell et al., 2003) to the infinite-horizon case 图片.png and the recently proposed Non-Stationary Policy Iteration (NSPI(m)) (Scherrer & Lesner, 2012). For all algorithms, we describe performance bounds with respect the per-iteration error , and make a comparison by paying a particular attention to the concentrability constants involved, the number of iterations and the memory required. Our analysis highlights the following points: 1) The performance guarantee of CPI can be arbitrarily better than that of API, but this comes at the cost of a relative—exponential in 1 —increase of the number of iterations. 2) PSDP? enjoys the best of both worlds: its performance guarantee is similar to that of CPI, but within a number of iterations similar to that of API. 3) Contrary to API that requires a constant memory, the memory needed by CPI and PSDP? is proportional to their number of iterations, which may be problematic when the discount factor ? is close to 1 or the approximation error  is close to 0; we show that the NSPI(m) algorithm allows to make an overall trade-off between memory and performance. Simulations with these schemes confirm our analysis.

上一篇:A Divide-and-Conquer Solver for Kernel Support Vector Machines

下一篇:Scalable Semidefinite Relaxation for Maximum A Posterior Estimation

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...