资源论文Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

2020-02-19 | |  118 |   113 |   0

Abstract

Despite the empirical success of the actor-critic algorithm, its theoretical understanding lags behind. In a broader context, actor-critic can be viewed as an online alternating update algorithm for bilevel optimization, whose convergence is known to be fragile. To understand the instability of actor-critic, we focus on its application to linear quadratic regulators, a simple yet fundamental setting of reinforcement learning. We establish a nonasymptotic convergence analysis of actor-critic in this setting. In particular, we prove that actor-critic finds a globally optimal pair of actor (policy) and critic (action-value function) at a linear rate of convergence. Our analysis may serve as a preliminary step towards a complete theoretical understanding of bilevel optimization with nonconvex subproblems, which is NP-hard in the worst case and is often solved using heuristics.

上一篇:Implicit Regularization in Deep Matrix Factorization

下一篇:Direct Estimation of Differential Functional Graphical Models

用户评价
全部评价

热门资源

  • Deep Cross-media ...

    Cross-media retrieval is a research hotspot in ...

  • Regularizing RNNs...

    Recently, caption generation with an encoder-de...

  • Supervised Descen...

    Many computer vision problems (e.

  • Attributed Graph ...

    Graph clustering is a fundamental task which di...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...