资源论文Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

Interactive Value Iteration for Markov Decision Processes with Unknown Rewards

2019-11-11 | |  53 |   33 |   0
Abstract To tackle the potentially hard task of defining the reward function in a Markov Decision Process, we propose a new approach, based on Value Iteration, which interweaves the elicitation and optimization phases. We assume that rewards whose numeric values are unknown can only be ordered, and that a tutor is present to help comparing sequences of rewards. We first show how the set of possible reward functions for a given preference relation can be represented as a polytope. Then our algorithm, called Interactive Value Iteration, searches for an optimal policy while refining its knowledge about the possible reward functions, by querying a tutor when necessary. We prove that the number of queries needed before finding an optimal policy is upperbounded by a polynomial in the size of the problem, and we present experimental results which demonstrate that our approach is efficient in practice.

上一篇:Run-Time Improvement of Point-Based POMDP Policies Minlue Wang and Richard Dearden

下一篇:Robust Optimization for Hybrid MDPs with State-Dependent Noise Zahra Zamani Karina Valdivia Delgado

用户评价
全部评价

热门资源

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Rating-Boosted La...

    The performance of a recommendation system reli...

  • Hierarchical Task...

    We extend hierarchical task network planning wi...