CIRL: Controllable Imitative Reinforcement Learning
for Vision-based Self-driving
Abstract. Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the
pre-processing perception system while the supervised learning-based models are
limited by the accessibility of extensive human experience. We present a general
and principled Controllable Imitative Reinforcement Learning (CIRL) approach
which successfully makes the driving agent achieve higher success rates based
on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of
classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover,
we propose to specialize adaptive policies and steering-angle reward designs for
different control signals (i.e. follow, straight, turn right, turn left) based on the
shared representations to improve the model capability in tackling with diverse
cases. Extensive experiments on CARLA driving benchmark demonstrate that
CIRL substantially outperforms all previous methods in terms of the percentage
of successfully completed episodes on a variety of goal-directed driving tasks.
We also show its superior generalization capability in unseen environments. To
our knowledge, this is the first successful case of the learned driving policy by
reinforcement learning in the high-fidelity simulator, which performs better than
supervised imitation learning