Abstract. We present a system for keyframe-based dense camera tracking
and depth map estimation that is entirely learned. For tracking, we estimate
small pose increments between the current camera image and a synthetic
viewpoint. This significantly simplifies the learning problem and alleviates
the dataset bias for camera motions. Further, we show that generating a large
number of pose hypotheses leads to more accurate predictions. For mapping,
we accumulate information in a cost volume centered at the current depth estimate. The mapping network then combines the cost volume and the keyframe
image to update the depth prediction, thereby effectively making use of depth
measurements and image-based priors. Our approach yields state-of-the-art
results with few images and is robust with respect to noisy camera poses.
We demonstrate that the performance of our 6 DOF tracking competes with
RGB-D tracking algorithms.We compare favorably against strong classic and
deep learning powered dense depth algorithms