Abstract
The goal of our work is to complete the depth channel of
an RGB-D image. Commodity-grade depth cameras often
fail to sense depth for shiny, bright, transparent, and distant
surfaces. To address this problem, we train a deep network
that takes an RGB image as input and predicts dense surface normals and occlusion boundaries. Those predictions
are then combined with raw depth observations provided by
the RGB-D camera to solve for depths for all pixels, including those missing in the original observation. This method
was chosen over others (e.g., inpainting depths directly) as
the result of extensive experiments with a new depth completion benchmark dataset, where holes are filled in training
data through the rendering of surface reconstructions created from multiview RGB-D scans. Experiments with different network inputs, depth representations, loss functions,
optimization methods, inpainting methods, and deep depth
estimation networks show that our proposed approach provides better depth completions than these alternatives