Q-Error as a Selection Mechanism in Modular Reinforcement-Learning Systems

资源分类

2019-11-12 |

56 |

44 |

Abstract This paper introduces a novel multi-modular method for reinforcement learning. A multimodular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself. There are many advantages to splitting up large tasks in this way, but existing methods face dif?culties when choosing which module(s) should contribute to the agent’s actions at any particular moment. We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input. The selection mechanism combines each module’s estimate of long-term reward and self-error to produce a score by which the next module is chosen. As a result, the modules can use their resources effectively and ef?ciently divide up the task. The system is shown to learn complex tasks even when the individual modules use only linear function approximators.

上一篇：Strategy Learning for Autonomous Agents in Smart Grid Markets

下一篇：Domain Adaptation with Ensemble of Feature Groups

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com