An Analysis of State-Relevance Weights and Sampling Distributions on L1 -Regularized Approximate Linear Programming Approximation Accuracy

2020-03-03 |

65 |

43 |

Abstract

Recent interest in the use of L1 regularization in the use of value function approximation includes Petrik et al.’s introduction of L1 -Regularized Approximate Linear Programming (RALP). RALP is unique among L1 -regularized approaches in that it approximates the optimal value function using off-policy samples. Additionally, it produces policies which outperform those of previous methods, such as LSPI. RALP’s value function approximation quality is affected heavily by the choice of state-relevance weights in the objective function of the linear program, and by the distribution from which samples are drawn; however, there has been no discussion of these considerations in the previous literature. In this paper, we discuss and explain the effects of choices in the state-relevance weights and sampling distribution on approximation quality, using both theoretical and experimental illustrations. The results provide insight not only onto these effects but also provide intuition into the types of MDPs which are especially well suited for approximation with RALP.

上一篇：Spectral Bandits for Smooth Graph Functions

下一篇：Online Bayesian Passive-Aggressive Learning

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com