Progressively Complementarity-aware Fusion Network
for RGB-D Salient Object Detection
Abstract
How to incorporate cross-modal complementarity
sufficiently is the cornerstone question for RGB-D salient
object detection. Previous works mainly address this issue
by simply concatenating multi-modal features or
combining unimodal predictions. In this paper, we answer
this question from two perspectives: (1) We argue that if the
complementary part can be modelled more explicitly, the
cross-modal complement is likely to be better captured. To
this end, we design a novel complementarity-aware fusion
(CA-Fuse) module when adopting the Convolutional
Neural Network (CNN). By introducing cross-modal
residual functions and complementarity-aware
supervisions in each CA-Fuse module, the problem of
learning complementary information from the paired
modality is explicitly posed as asymptotically
approximating the residual function. (2) Exploring the
complement across all the levels. By cascading the
CA-Fuse module and adding level-wise supervision from
deep to shallow densely, the cross-level complement can be
selected and combined progressively. The proposed
RGB-D fusion network disambiguates both cross-modal
and cross-level fusion processes and enables more
sufficient fusion results. The experiments on public datasets
show the effectiveness of the proposed CA-Fuse module and
the RGB-D salient object detection network