Planning in entropy-regularized Markov decision processes and games

登录免费注册

资源分类

论文
算法
数据集
经验分享
技术动态
行业动态

论文
学习
研究领域

算法
学习
研究领域

数据集
自动驾驶
图片

经验分享
学习
研究领域

技术动态
计算机视觉
自然语言处理

行业动态
教育
语音识别

》资源》论文》Planning in entropy-regularized Markov decision processes and games

Planning in entropy-regularized Markov decision processes and games

2020-02-21 |

34 |

31 |

Planning in entropy-regularized Markov decision processes and games
论文

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the environment. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve 4 problem-independent sample complexity of order 图片.png for a desired accuracy ε, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.