Abstract
In this paper we study the problem of monocular relative depth perception in the wild. We introduce a simple yet
effective method to automatically generate dense relative
depth annotations from web stereo images, and propose a
new dataset that consists of diverse images as well as corresponding dense relative depth maps. Further, an improved
ranking loss is introduced to deal with imbalanced ordinal
relations, enforcing the network to focus on a set of hard
pairs. Experimental results demonstrate that our proposed
approach not only achieves state-of-the-art accuracy of relative depth perception in the wild, but also benefits other
dense per-pixel prediction tasks, e.g., metric depth estimation and semantic segmentation