Abstract
In recent years, the task of estimating the 6D pose ofobject instances and complete scenes, i.e. camera localiza-tion, from a single input image has received considerableattention. Consumer RGB-D cameras have made this fea-sible, even for difficult, texture-less objects and scenes. Inthis work, we show that a single RGB image is sufficientto achieve visually convincing results. Our key concept isto model and exploit the uncertainty of the system at allstages of the processing pipeline. The uncertainty comes in the form of continuous distributions over 3D object coordinates and discrete distributions over object labels. We give three technical contributions. Firstly, we develop a regularized, auto-context regression framework which iter-atively reduces uncertainty in object coordinate and objectlabel predictions. Secondly, we introduce an efficient way to marginalize object coordinate distributions over depth. This is necessary to deal with missing depth information. Thirdly, we utilize the distributions over object labels to de-tect multiple objects simultaneously with a fixed budget of RANSAC hypotheses. We tested our system for object pose estimation and camera localization on commonly used data sets. We see a major improvement over competing systems.