资源论文A Dantzig Selector Approach to Temporal Difference Learning

A Dantzig Selector Approach to Temporal Difference Learning

2020-02-28 | |  67 |   54 |   0

Abstract

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, 图片.png-regularization methods tend to perform feature selection by promoting sparsity, and thus, are wellsuited for high–dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed–point problem, its integration with 图片.png-regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks.

上一篇:Joint Optimization and Variable Selection of High-dimensional Gaussian Processes

下一篇:Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...