Abstract
We propose a novel and principled hybrid CNN+CRF
model for stereo estimation. Our model allows to exploit the
advantages of both, convolutional neural networks (CNNs)
and conditional random fields (CRFs) in an unified approach. The CNNs compute expressive features for matching and distinctive color edges, which in turn are used to
compute the unary and binary costs of the CRF. For inference, we apply a recently proposed highly parallel dual
block descent algorithm which only needs a small fixed
number of iterations to compute a high-quality approximate minimizer. As the main contribution of the paper, we
propose a theoretically sound method based on the structured output support vector machine (SSVM) to train the hybrid CNN+CRF model on large-scale data end-to-end. Our
trained models perform very well despite the fact that we
are using shallow CNNs and do not apply any kind of postprocessing to the final output of the CRF. We evaluate our
combined models on challenging stereo benchmarks such
as Middlebury 2014 and Kitti 2015 and also investigate the
performance of each individual component