Abstract. Deep Learning based stereo matching methods have shown
great successes and achieved top scores across different benchmarks.
However, like most data-driven methods, existing deep stereo matching networks suffer from some well-known drawbacks such as requiring
large amount of labeled training data, and that their performances are
fundamentally limited by the generalization ability. In this paper, we propose a novel Recurrent Neural Network (RNN) that takes a continuous
(possibly previously unseen) stereo video as input, and directly predicts
a depth-map at each frame without a pre-training process, and without the need of ground-truth depth-maps as supervision. Thanks to the
recurrent nature (provided by two convolutional-LSTM blocks), our network is able to memorize and learn from its past experiences, and modify
its inner parameters (network weights) to adapt to previously unseen or
unfamiliar environments. This suggests a remarkable generalization ability of the net, making it applicable in an open world setting. Our method
works robustly with changes in scene content, image statistics, and lighting and season conditions etc. By extensive experiments, we demonstrate
that the proposed method seamlessly adapts between different scenarios.
Equally important, in terms of the stereo matching accuracy, it outperforms state-of-the-art deep stereo approaches on standard benchmark
datasets such as KITTI and Middlebury stereo.