Abstract
We propose a novel linear method to match cuboids in indoor scenes using RGBD images from Kinect. Beyond depth maps, these cuboids reveal important structures of a scene. Instead of directly fifitting cuboids to 3D data, we fifirst construct cuboid candidates using superpixel pairs on a RGBD image, and then we optimize the confifiguration of the cuboids to satisfy the global structure constraints. The optimal confifiguration has low local matching costs, small object intersection and occlusion, and the cuboids tend to project to a large region in the image; the number of cuboids is optimized simultaneously. We formulate the multiple cuboid matching problem as a mixed integer linear program and solve the optimization effificiently with a branch and bound method. The optimization guarantees the global optimal solution. Our experiments on the Kinect RGBD images of a variety of indoor scenes show that our proposed method is effificient, accurate and robust against object appearance variations, occlusions and strong clutter.