Abstract. Fine-grained visual recognition is challenging because it highly relies on the modeling of various semantic parts and fine-grained feature learning. Bilinear pooling based models have been shown to be effective at fine-grained recognition, while most previous approaches neglect
the fact that inter-layer part feature interaction and fine-grained feature
learning are mutually correlated and can reinforce each other. In this
paper, we present a novel model to address these issues. First, a crosslayer bilinear pooling approach is proposed to capture the inter-layer
part feature relations, which results in superior performance compared
with other bilinear pooling based approaches. Second, we propose a novel
hierarchical bilinear pooling framework to integrate multiple cross-layer
bilinear features to enhance their representation capability. Our formulation is intuitive, efficient and achieves state-of-the-art results on the
widely used fine-grained recognition datasets.