IS AGOOD REPRESENTATION SUFFICIENT FOR SAMPLEEFFICIENT REINFORCEMENT LEARNING ?

资源分类

2020-01-02 |

83 |

59 |

Abstract

Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for efficient reinforcement learning? This question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning. This work provides strong negative results for reinforcement learning methods with function approximation for which a good representation (feature extractor) is known to the agent, focusing on natural representational conditions relevant to value-based learning and policy-based learning. For value-based learning, we show that even if the agent has a highly accurate linear representation, the agent still needs to sample an exponential number trajectories in order to find a near-optimal policy. For policy-based learning, we show even if the agent’s linear representation is capable of perfectly predicting the optimal action at any state, the agent still needs to sample an exponential number of trajectories in order to find a near-optimal policy. These lower bounds highlight the fact that having a good (value-based or policybased) representation in and of itself is insufficient for efficient reinforcement learning and that additional assumptions are needed. In particular, these results provide new insights into why the analysis of existing provably efficient reinforcement learning methods make assumptions which are partly model-based in nature. Furthermore, our lower bounds also imply exponential separations on the sample complexity between 1) value-based learning with perfect representation and valuebased learning with a good-but-not-perfect representation, 2) value-based learning and policy-based learning, 3) policy-based learning and supervised learning and 4) reinforcement learning and imitation learning.

上一篇：SQIL: IMITATION LEARNING VIA REINFORCEMENTL EARNING WITH SPARSE REWARDS

下一篇：LEARNING HEURISTICS FOR QUANTIFIED BOOLEANF ORMULAS THROUGH REINFORCEMENT LEARNING

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com