资源论文Value Pursuit Iteration

Value Pursuit Iteration

2020-01-13 | |  155 |   89 |   0

Abstract

Value Pursuit Iteration (VPI) is an approximate value iteration algorithm that finds a close to optimal policy for reinforcement learning problems with large state spaces. VPI has two main features: First, it is a nonparametric algorithm that finds a good sparse approximation of the optimal value function given a dictionary of features. The algorithm is almost insensitive to the number of irrelevant features. Second, after each iteration of VPI, the algorithm adds a set of functions based on the currently learned value function to the dictionary. This increases the representation power of the dictionary in a way that is directly relevant to the goal of having a good approximation of the optimal value function. We theoretically study VPI and provide a finite-sample error upper bound for it.

上一篇:Communication-Efficient Algorithms for Statistical Optimization

下一篇:Nonparanormal Belief Propagation (NPNBP)

用户评价
全部评价

热门资源

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Bounding the Inef...

    Social networks on the Internet have seen an en...

  • Shape-based Autom...

    We present an algorithm for automatic detection...

  • Joint Pose and Ex...

    Facial expression recognition (FER) is a challe...