FREQUENCY- BASED SEARCH -CONTROL IN DYNA

资源分类

2020-01-02 |

192 |

52 |

Abstract

Model-based reinforcement learning has been empirically demonstrated as a successful strategy to improve sample efficiency. Particularly, Dyna architecture, as an elegant model-based architecture integrating learning and planning, provides huge flexibility of using a model. One of the most important components in Dyna is called search-control, which refers to the process of generating state or stateaction pairs from which we query the model to acquire simulated experiences. Search-control is critical to improve learning efficiency. In this work, we propose a simple and novel search-control strategy by searching high frequency region on value function. Our main intuition is built on Shannon sampling theorem from signal processing, which indicates that a high frequency signal requires more samples to reconstruct. We empirically show that a high frequency function is more difficult to approximate. This suggests a search-control strategy: we should use states in high frequency region of the value function to query the model to acquire more samples. We develop a simple strategy to locally measure the frequency of a function by gradient norm, and provide theoretical justification for this approach. We then apply our strategy to search-control in Dyna, and conduct experiments to show its property and effectiveness on benchmark domains.

上一篇：CLN2INV: LEARNING LOOP INVARIANTS WITHC ONTINUOUS LOGIC NETWORKS

下一篇：MAXMIN Q- LEARNING :C ONTROLLING THE ESTIMA -TION BIAS OF Q- LEARNING

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com