Abstract
Cross-spectral imaging provides strong benefits for
recognition and detection tasks. Often, multiple cameras
are used for cross-spectral imaging, thus requiring image
alignment, or disparity estimation in a stereo setting. Increasingly, multi-camera cross-spectral systems are embedded in active RGBD devices (e.g. RGB-NIR cameras in
Kinect and iPhone X). Hence, stereo matching also provides
an opportunity to obtain depth without an active projector
source. However, matching images from different spectral
bands is challenging because of large appearance variations. We develop a novel deep learning framework to simultaneously transform images across spectral bands and
estimate disparity. A material-aware loss function is incorporated within the disparity prediction network to handle regions with unreliable matching such as light sources,
glass windshields and glossy surfaces. No depth supervision is required by our method. To evaluate our method,
we used a vehicle-mounted RGB-NIR stereo system to collect 13.7 hours of video data across a range of areas in and
around a city. Experiments show that our method achieves
strong performance and reaches real-time speed