SYNTHESIZING PROGRAMMATIC POLICIES THAT IN-DUCTIVELY GENERALIZE

资源分类

2020-01-02 |

58 |

44 |

Abstract

Deep reinforcement learning has successfully solved a number of challenging control tasks. However, learned policies typically have difficulty generalizing to novel environments. We propose an algorithm for learning programmatic state machine policies that can capture repeating behaviors. By doing so, they have the ability to generalize to instances requiring an arbitrary number of repetitions, a property we call inductive generalization. However, state machine policies are hard to learn since they consist of a combination of continuous and discrete structure. We propose a learning framework called adaptive teaching, which learns a state machine policy by imitating a teacher; in contrast to traditional imitation learning, our teacher adaptively updates itself based on the structure of the student. We show how our algorithm can be used to learn policies that inductively generalize to novel environments, whereas traditional neural network policies fail to do so.

上一篇：V-MPO: ON -P OLICY MAXIMUM AP OSTERIORIP OLICY OPTIMIZATION FOR DISCRETE ANDC ONTINUOUS CONTROL

下一篇：TRUTH OR BACKPROPAGANDA ?A NEMPIRICAL IN -VESTIGATION OF DEEP LEARNING THEORY

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com