Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle
analysis from monocular image
Abstract
In this paper, we present a novel approach, called Deep
MANTA (Deep Many-Tasks), for many-task vehicle analysis from a given image. A robust convolutional network is
introduced for simultaneous vehicle detection, part localization, visibility characterization and 3D dimension estimation. Its architecture is based on a new coarse-to-fine
object proposal that boosts the vehicle detection. Moreover,
the Deep MANTA network is able to localize vehicle parts
even if these parts are not visible. In the inference, the network’s outputs are used by a real time robust pose estimation algorithm for fine orientation estimation and 3D vehicle localization. We show in experiments that our method
outperforms monocular state-of-the-art approaches on vehicle detection, orientation and 3D location tasks on the
very challenging KITTI benchmark