Deep Virtual Stereo Odometry:
Leveraging Deep Depth Prediction for
Monocular Direct Sparse Odometry
Abstract. Monocular visual odometry approaches that purely rely on
geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction.
In this paper, we propose to leverage deep monocular depth prediction
to overcome limitations of geometry-based monocular visual odometry.
To this end, we incorporate deep depth predictions into Direct Sparse
Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from
a single image in a two-stage process. We train our network in a semisupervised way on photoconsistency in stereo images and on consistency
with accurate sparse depth reconstructions from Stereo DSO. Our deep
predictions excel state-of-the-art approaches for monocular depth on the
KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly
exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art
stereo methods, while only relying on a single camera