Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

登录免费注册

资源分类

论文
算法
数据集
经验分享
技术动态
行业动态

论文
学习
研究领域

算法
学习
研究领域

数据集
自动驾驶
图片

经验分享
学习
研究领域

技术动态
计算机视觉
自然语言处理

行业动态
教育
语音识别

》资源》论文》Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

2020-01-16 |

118 |

39 |

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning
论文

Abstract

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space. The proposed algorithm combines state aggregation with the use of upper confidence bounds for implementing optimism in the face of uncertainty. Beside the existence of an optimal policy which satisfies the Poisson equation, the only assumptions made are Ho?lder continuity of rewards and transition probabilities.

上一篇：Inverse Reinforcement Learning through Structured Classification

下一篇：Cost-Sensitive Exploration in Bayesian Reinforcement Learning

用户评价