Point-Based Value Iteration for Constrained POMDPs

资源分类

2019-11-12 |

59 |

41 |

Abstract Constrained partially observable Markov decision processes (CPOMDPs) extend the standard POMDPs by allowing the speci?cation of constraints on some aspects of the policy in addition to the optimality objective for the value function. CPOMDPs have many practical advantages over standard POMDPs since they naturally model problems involving limited resource or multiple objectives. In this paper, we show that the optimal policies in CPOMDPs can be randomized, and present exact and approximate dynamic programming methods for computing randomized optimal policies. While the exact method requires solving a minimax quadratically constrained program (QCP) in each dynamic programming update, the approximate method utilizes the point-based value update with a linear program (LP). We show that the randomized policies are signi?cantly better than the deterministic ones. We also demonstrate that the approximate point-based method is scalable to solve large problems.

上一篇：On the Decidability of HTN Planning with Task Insertion

下一篇：Monitoring the Execution of Partial-Order Plans via Regression

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com