Regret based Robust Solutions for Uncertain Markov Decision Processes

资源分类

2020-01-16 |

95 |

46 |

Abstract

In this paper, we seek robust policies for uncertain Markov Decision Processes (MDPs). Mostrobust optimization approaches for these problems have focussed on the computation of maximinpolicies which maximize the value corresponding to the worst realization of the uncertainty. Recentwork has proposed minimax regret as a suitable alternative to the maximin objective for robust op-timization. However, existing algorithms for handling minimax regret are restricted to models withuncertainty over rewards only. We provide algorithms that employ sampling to improve across mul-tiple dimensions: (a) Handle uncertainties over both transition and reward models; (b) Dependenceof model uncertainties across state, action pairs and decision epochs; (c) Scalability and qualitybounds. Finally, to demonstrate the empirical effectiveness of our sampling approaches, we pro-vide comparisons against benchmark algorithms on two domains from literature. We also provide aSample Average Approximation (SAA) analysis to compute a posteriori error bounds.

上一篇：Adaptive Step–Size for Policy Gradient Methods

下一篇：Nonparametric Multi-group Membership Model for Dynamic Networks

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com