Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion
Forecasting with a Single Convolutional Net
Abstract
In this paper we propose a novel deep neural network
that is able to jointly reason about 3D detection, tracking and motion forecasting given data captured by a 3D
sensor. By jointly reasoning about these tasks, our holistic approach is more robust to occlusion as well as sparse
data at range. Our approach performs 3D convolutions
across space and time over a bird’s eye view representation of the 3D world, which is very efficient in terms of
both memory and computation. Our experiments on a new
very large scale dataset captured in several north american
cities, show that we can outperform the state-of-the-art by a
large margin. Importantly, by sharing computation we can
perform all tasks in as little as 30 ms.