资源论文?-min: A Compact Approximate Solver For Finite-Horizon POMDPs

?-min: A Compact Approximate Solver For Finite-Horizon POMDPs

2019-11-19 | |  69 |   35 |   0
Abstract In many POMDP applications in computational sustainability, it is important that the computed policy have a simple description, so that it can be easily interpreted by stakeholders and decision makers. One measure of simplicity for POMDP value functions is the number of ?-vectors required to represent the value function. Existing POMDP methods seek to optimize the accuracy of the value function, which can require a very large number of ?-vectors. This paper studies methods that allow the user to explore the tradeoff between the accuracy of the value function and the number of ?vectors. Building on previous point-based POMDP solvers, this paper introduces a new algorithm (?min) that formulates a Mixed Integer Linear Program (MILP) to calculate approximate solutions for finite-horizon POMDP problems with limited numbers of ?-vectors. At each time-step, ?-min calculates ?-vectors to greedily minimize the gap between current upper and lower bounds of the value function. In doing so, good upper and lower bounds are quickly reached allowing a good approximation of the problem with few ?-vectors. Experimental results show that ?-min provides good approximate solutions given a fixed number of ?-vectors on small benchmark problems, on a larger randomly generated problem, as well as on a computational sustainability problem to best manage the endangered Sumatran tiger.

上一篇:Reasoning about Connectivity Constraints

下一篇:When Security Games Go Green: Designing Defender Strategies to Prevent Poaching and Illegal Fishing

用户评价
全部评价

热门资源

  • Learning to Predi...

    Much of model-based reinforcement learning invo...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • The Variational S...

    Unlike traditional images which do not offer in...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Rating-Boosted La...

    The performance of a recommendation system reli...