资源论文Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

2020-02-21 | |  37 |   32 |   0

Abstract

We study the sample complexity of approximate policy iteration (PI) for the Linear Quadratic Regulator (LQR), building on a recent line of work using LQR as a testbed to understand the limits of reinforcement learning (RL) algorithms on continuous control tasks. Our analysis quantifies the tension between policy improvement and policy evaluation, and suggests that policy evaluation is the dominant factor in terms of sample complexity. Specifically, we show that to obtain a controller that is within ε of the optimal LQR controller, each step of policy evaluation requires at most 图片.png samples, where n is the dimension of the state vector and d is the dimension of the input vector. On the other hand, only log(1/ε) policy improvement steps suffice, resulting in an overall sample complexity of 图片.png We furthermore build on our analysis and construct a simple adaptive procedure based on ε-greedy exploration which relies on approximate PI as a sub-routine and obtains 图片.png regret, improving upon a recent result of Abbasi-Yadkori et al. [3].

上一篇:Convolution with even-sized kernels and symmetric padding

下一篇:R2D2: Repeatable and Reliable Detector and Descriptor

用户评价
全部评价

热门资源

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...