Abstract
The depth information of RGB-D sensors has greatly simpli fied some common challenges in computer vision and enabled breakthroughs for several tasks. In this paper, we propose to use depth maps for object detection and de- sign a 3D detector to overcome the major difficulties for recognition, namely the variations of texture, illumination, shape, viewpoint, clutter, occlusion, self- occlusion and sensor noises. We take a collection of 3D CAD models and render each CAD model from hundreds of viewpoints to obtain synthetic depth maps. For each depth rendering, we extract features from the 3D point cloud and train an Exemplar-SVM classi fier. During testing and hard-negative mining, we slide a 3D detection window in 3D space. Experiment results show that our 3D detector D images, and achieves about x 1.7 improvement on average precision compared signi ficantly outperforms the state-of-the-art algorithms for both RGB and RGB- to DPM and R-CNN. All source code and data are available online.