Abstract. Fine-Grained Visual Classification (FGVC) datasets contain
small sample sizes, along with significant intra-class variation and interclass similarity. While prior work has addressed intra-class variation using
localization and segmentation techniques, inter-class similarity may also
affect feature learning and reduce classification performance. In this work,
we address this problem using a novel optimization procedure for the
end-to-end neural network training on FGVC tasks. Our procedure, called
Pairwise Confusion (PC) reduces overfitting by intentionally introducing
confusion in the activations. With PC regularization, we obtain state-ofthe-art performance on six of the most widely-used FGVC datasets and
demonstrate improved localization ability. PC is easy to implement, does
not need excessive hyperparameter tuning during training, and does not
add significant overhead during test time