A Dantzig Selector Approach to Temporal Difference Learning

资源分类

2020-02-28 |

67 |

54 |

Abstract

LSTD is a popular algorithm for value function approximation. Whenever the number of features is larger than the number of samples, it must be paired with some form of regularization. In particular, 图片.png -regularization methods tend to perform feature selection by promoting sparsity, and thus, are wellsuited for high–dimensional problems. However, since LSTD is not a simple regression algorithm, but it solves a fixed–point problem, its integration with -regularization is not straightforward and might come with some drawbacks (e.g., the P-matrix assumption for LASSO-TD). In this paper, we introduce a novel algorithm obtained by integrating LSTD with the Dantzig Selector. We investigate the performance of the proposed algorithm and its relationship with the existing regularized approaches, and show how it addresses some of their drawbacks.

上一篇：Joint Optimization and Variable Selection of High-dimensional Gaussian Processes

下一篇：Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com