Abstract
In the past year, convolutional neural networks havebeen shown to perform extremely well for stereo estima-tion. However, current architectures rely on siamese net-works which exploit concatenation followed by further pro-cessing layers, requiring a minute of GPU computation perimage pair. In contrast, in this paper we propose a match-ing network which is able to produce very accurate resultsin less than a second of GPU computation. Towards thisgoal, we exploit a product layer which simply computes theinner product between the two representations of a siamesearchitecture. We train our network by treating the problemas multi-class classification, where the classes are all pos-sible disparities. This allows us to get calibrated scores, which result in much better matching performance when compared to existing approaches.