Abstract
Stereo matching algorithms usually consist of four steps,
including matching cost calculation, matching cost aggregation, disparity calculation, and disparity refinement. Existing CNN-based methods only adopt CNN to solve parts of
the four steps, or use different networks to deal with different steps, making them difficult to obtain the overall optimal
solution. In this paper, we propose a network architecture to
incorporate all steps of stereo matching. The network consists of three parts. The first part calculates the multi-scale
shared features. The second part performs matching cost
calculation, matching cost aggregation and disparity calculation to estimate the initial disparity using shared features. The initial disparity and the shared features are used
to calculate the feature constancy that measures correctness of the correspondence between two input images. The
initial disparity and the feature constancy are then fed into
a sub-network to refine the initial disparity. The proposed
method has been evaluated on the Scene Flow and KITTI
datasets. It achieves the state-of-the-art performance on
the KITTI 2012 and KITTI 2015 benchmarks while maintaining a very fast running time. Source code is available at
http://github.com/leonzfa/iResNet