Abstract
Robust visual localization under a wide range of viewing conditions is a fundamental problem in computer vision. Handling the difficult cases of this problem is not
only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented
reality or autonomous robots. In this paper, we propose a
novel approach based on a joint 3D geometric and semantic understanding of the world, enabling it to succeed under
conditions where previous approaches failed. Our method
leverages a novel generative model for descriptor learning,
trained on semantic scene completion as an auxiliary task.
The resulting 3D descriptors are robust to missing observations by encoding high-level 3D geometric and semantic information. Experiments on several challenging large-scale
localization datasets demonstrate reliable localization under extreme viewpoint, illumination, and geometry changes