资源论文Mean-Variance Optimization in Markov Decision Processes

Mean-Variance Optimization in Markov Decision Processes

2020-02-27 | |  58 |   41 |   0

Abstract

We consider finite horizon Markov decision processes under performance measures that involve both the mean and the variance of the cumulative reward. We show that either randomized or history-based policies can improve performance. We prove that the complexity of computing a policy that maximizes the mean reward under a variance constraint is NP-hard for some cases, and strongly NPhard for others. We finally offer pseudopolynomial exact and approximation algorithms.

上一篇:On Bayesian PCA: Automatic Dimensionality Selection and Analytic Solution

下一篇:Preserving Personalized Pagerank in Subgraphs

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...