Abstract. Confidence measures for stereo gained popularity in recent
years due to their improved capability to detect outliers and the increasing number of applications exploiting these cues. In this field, convolutional neural networks achieved top-performance compared to other
known techniques in the literature by processing local information to tell
disparity assignments from outliers. Despite this outstanding achievements, all approaches rely on clues extracted with small receptive fields
thus ignoring most of the overall image content. Therefore, in this paper, we propose to exploit nearby and farther clues available from image
and disparity domains to obtain a more accurate confidence estimation.
While local information is very effective for detecting high frequency patterns, it lacks insights from farther regions in the scene. On the other
hand, enlarging the receptive field allows to include clues from farther
regions but produces smoother uncertainty estimation, not particularly
accurate when dealing with high frequency patterns. For these reasons,
we propose in this paper a multi-stage cascaded network to combine the
best of the two worlds. Extensive experiments on three datasets using
three popular stereo algorithms prove that the proposed framework outperforms state-of-the-art confidence estimation techniques