Fast deep reinforcement learning using online adjustments from the past

登录免费注册

资源分类

论文
算法
数据集
经验分享
技术动态
行业动态

论文
学习
研究领域

算法
学习
研究领域

数据集
自动驾驶
图片

经验分享
学习
研究领域

技术动态
计算机视觉
自然语言处理

行业动态
教育
语音识别

》资源》论文》Fast deep reinforcement learning using online adjustments from the past

Fast deep reinforcement learning using online adjustments from the past

2020-02-13 |

106 |

54 |

Fast deep reinforcement learning using online adjustments from the past
论文

Abstract

We propose Ephemeral Value Adjusments (EVA): a means of allowing deep reinforcement learning agents to rapidly adapt to experience in their replay buffer. EVA shifts the value predicted by a neural network with an estimate of the value function found by planning over experience tuples from the replay buffer near the current state. EVA combines a number of recent ideas around combining episodic memory-like structures into reinforcement learning agents: slot-based storage, content-based retrieval, and memory-based planning. We show that EVA is performant on a demonstration task and Atari games.

上一篇：Reinforcement Learning for Solving the Vehicle Routing Problem

下一篇：Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

用户评价