资源论文Probably Approximately Correct Learning in Stochastic Games with Temporal Logic Specifications

Probably Approximately Correct Learning in Stochastic Games with Temporal Logic Specifications

2019-11-25 | |  43 |   37 |   0
Abstract We consider a controller synthesis problem in turnbased stochastic games with both a qualitative linear temporal logic (LTL) constraint and a quantitative discounted-sum objective. For each case in which the LTL specification is realizable and can be equivalently transformed into a deterministic Buchi automaton, we show that there always exists a memoryless almost-sure winning strategy that is "-optimal with respect to the discounted-sum objective for any arbitrary positive ". Building on the idea of the R-MAX algorithm, we propose a probably approximately correct (PAC) learning algorithm that can learn such a strategy efficiently in an online manner with a-priori unknown reward functions and unknown transition distributions. To the best of our knowledge, this is the first result on PAC learning in stochastic games with independent quantitative and qualitative objectives.

上一篇:Group Decision Making via Probabilistic Belief Merging

下一篇:ATUCAPTS: Automated Tests That a User Cannot Pass Twice Simultaneously

用户评价
全部评价

热门资源

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...