Abstract
Modern deep learning methods provide effective means to learn good representations. However, is a good representation itself sufficient for efficient reinforcement learning? This question is largely unexplored, and the extant body of literature mainly focuses on conditions which permit efficient reinforcement learning with little understanding of what are necessary conditions for efficient reinforcement learning. This work provides strong negative results for reinforcement learning methods with function approximation for which a good representation (feature extractor) is known to the agent, focusing on natural representational conditions relevant to value-based learning and policy-based learning. For value-based learning, we show that even if the agent has a highly accurate linear representation, the agent still needs to sample an exponential number trajectories in order to find a near-optimal policy. For policy-based learning, we show even if the agent’s linear representation is capable of perfectly predicting the optimal action at any state, the agent still needs to sample an exponential number of trajectories in order to find a near-optimal policy. These lower bounds highlight the fact that having a good (value-based or policybased) representation in and of itself is insufficient for efficient reinforcement learning and that additional assumptions are needed. In particular, these results provide new insights into why the analysis of existing provably efficient reinforcement learning methods make assumptions which are partly model-based in nature. Furthermore, our lower bounds also imply exponential separations on the sample complexity between 1) value-based learning with perfect representation and valuebased learning with a good-but-not-perfect representation, 2) value-based learning and policy-based learning, 3) policy-based learning and supervised learning and 4) reinforcement learning and imitation learning.