Abstract
View-based methods have achieved considerable success
in 3D object recognition tasks. Different from existing viewbased methods pooling the view-wise features, we tackle
this problem from the perspective of patches-to-patches similarity measurement. By exploiting the relationship between
polynomial kernel and bilinear pooling, we obtain an effective 3D object representation by aggregating local convolutional features through bilinear pooling. Meanwhile,
we harmonize different components inherited in the bilinear feature to obtain a more discriminative representation.
To achieve an end-to-end trainable framework, we incorporate the harmonized bilinear pooling as a layer of a network, constituting the proposed Multi-view Harmonized Bilinear Network (MHBN). Systematic experiments conducted
on two public benchmark datasets demonstrate the efficacy
of the proposed methods in 3D object recognition