Abstract
We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive
in computation due to high dimensionality of point clouds.
We utilize the 3D data more efficiently by representing the
scene from the Bird’s Eye View (BEV), and propose PIXOR,
a proposal-free, single-stage detector that outputs oriented
3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are specially designed to
balance high accuracy and real-time efficiency. We validate
PIXOR on two datasets: the KITTI BEV object detection
benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector
surpasses other state-of-the-art methods notably in terms of
Average Precision (AP), while still runs at 10 FPS