Abstract
We focus on the task of amodal 3D object detection in RGB-D images, which aims to produce a 3D bounding boxof an object in metric form at its full extent. We introduceDeep Sliding Shapes, a 3D ConvNet formulation that takesa 3D volumetric scene from a RGB-D image as input andoutputs 3D object bounding boxes. In our approach, wepropose the first 3D Region Proposal Network (RPN) to learn objectness from geometric shapes and the first joint Object Recognition Network (ORN) to extract geometric features in 3D and color features in 2D. In particular, we handle objects of various sizes by training an amodal RPN at two different scales and an ORN to regress 3D bounding boxes. Experiments show that our algorithm outperforms the state-of-the-art by 13.8 in mAP and is 200× faster thanthe original Sliding Shapes.