Abstract
Policy evaluation with linear function approximation is an important problem in reinforcement learning. When facing high-dimensional feature spaces, such a problem becomes extremely hard considering the computation efficiency and quality of approximations. We propose a new algorithm, LSTD(?)-RP, which leverages random projection techniques and takes eligibility traces into consideration to tackle the above two challenges. We carry out theoretical analysis of LSTD(?)-RP, and provide meaningful upper bounds of the estimation error, approximation error and total generalization error. These results demonstrate that LSTD(?)-RP can benefit from random projection and eligibility traces strategies, and LSTD(?)-RP can achieve better performances than prior LSTDRP and LSTD(?) algorithms.