KEEP DOING WHAT WORKED :B EHAVIOR MODELLING PRIORS FOR OFFLINE REIN -FORCEMENT LEARNING

资源分类

2020-01-02 |

67 |

40 |

Abstract

Off-policy reinforcement learning algorithms promise to be applicable in settings where only a fixed data-set (batch) of environment interactions is available and no new experience can be acquired. This property makes these algorithms appealing for real world problems such as robot control. In practice, however, standard off-policy algorithms fail in the batch setting for continuous control. In this paper, we propose a simple solution to this problem. It admits the use of data generated by arbitrary behavior policies and uses a learned prior – the advantage-weighted behavior model (ABM) – to bias the RL policy towards actions that have previously been executed and are likely to be successful on the new task. Our method can be seen as an extension of recent work on batch-RL that enables stable learning from conflicting data-sources. We find improvements on competitive baselines in a variety of RL tasks – including standard continuous control benchmarks and multitask learning for simulated and real-world robots. Videos are available at https: //sites.google.com/view/behavior-modelling-priors.

上一篇：IAM GOING MAD: MAXIMUM DISCREPANCY COM -PETITION FOR COMPARING CLASSIFIERS ADAPTIVELY

下一篇：REINFORCEMENT LEARNING WITH COMPETITIVEE NSEMBLES OF INFORMATION -C ONSTRAINEDP RIMITIVES

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com