Abstract
We present a descriptor, called fully convolutional selfsimilarity (FCSS), for dense semantic correspondence. To
robustly match points among different instances within the
same object class, we formulate FCSS using local selfsimilarity (LSS) within a fully convolutional network. In
contrast to existing CNN-based descriptors, FCSS is inherently insensitive to intra-class appearance variations because of its LSS-based structure, while maintaining the precise localization ability of deep neural networks. The sampling patterns of local structure and the self-similarity measure are jointly learned within the proposed network in
an end-to-end and multi-scale manner. As training data
for semantic correspondence is rather limited, we propose
to leverage object candidate priors provided in existing
image datasets and also correspondence consistency between object pairs to enable weakly-supervised learning.
Experiments demonstrate that FCSS outperforms conventional handcrafted descriptors and CNN-based descriptors
on various benchmarks.