Abstract. We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend
to lower-level feature activations without requiring part annotations and
uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters
and computation required. Experiments show that networks augmented
with our approach systematically improve their classification accuracy
and become more robust to clutter. As a result, Wide Residual Networks
augmented with our proposal surpasses the state of the art classification
accuracies in CIFAR-10, the Adience gender recognition task, Stanford
dogs, and UEC Food-100.