Abstract
This work aims to build pixel-to-pixel correspondences
between images from the same visual class but with different
geometries and visual similarities. This task is particularly
challenging because (i) their visual content is similar only on the high-level structure, and (ii) background clutters
keep bringing in noises.
To address these problems, this paper proposes an
object-aware method to estimate per-pixel correspondences
from semantic to low-level by learning a classifier for each
selected discriminative grid cell and guiding the localization of every pixel under the semantic constraint. Specifically, an Object-aware Hierarchical Graph (OHG) model is constructed to regulate matching consistency from one
coarse grid cell containing whole object(s), to fine grid cells covering smaller semantic elements, and finally to every
pixel. A guidance layer is introduced as the semantic constraint on local structure matching. In addition, we propose
to learn the important high-level structure for each grid cell
in an “objectness-driven” way as an alternative to handcrafted descriptors in defining a better visual similarity.
The proposed method has been extensively evaluated on
various challenging benchmarks and real-world images.
The results show that our method significantly outperforms
the state-of-the-arts in terms of semantic flow accuracy