Abstract
Pooling second-order local feature statistics to form a high-dimensional bilinear feature has been shown to achieve state-of-the-art performance on a variety of fifinegrained classifification tasks. To address the computational demands of high feature dimensionality, we propose to represent the covariance features as a matrix and apply a lowrank bilinear classififier. The resulting classififier can be evaluated without explicitly computing the bilinear feature map which allows for a large reduction in the compute time as well as decreasing the effective number of parameters to be learned. To further compress the model, we propose a classi- fifier co-decomposition that factorizes the collection of bilinear classififiers into a common factor and compact perclass terms. The co-decomposition idea can be deployed through two convolutional layers and trained in an endto-end architecture. We suggest a simple yet effective initialization that avoids explicitly fifirst training and factorizing the larger bilinear classififiers. Through extensive experiments, we show that our model achieves state-of-theart performance on several public datasets for fifine-grained classifification trained with only category labels. Importantly, our fifinal model is an order of magnitude smaller than the recently proposed compact bilinear model [8], and three orders smaller than the standard bilinear CNN model [19].