Abstract. Fine-grained classification is challenging due to the difficulty
of finding discriminative features. Finding those subtle traits that fully
characterize the object is not straightforward. To handle this circumstance, we propose a novel self-supervision mechanism to effectively localize informative regions without the need of bounding-box/part annotations. Our model, termed NTS-Net for Navigator-Teacher-Scrutinizer
Network, consists of a Navigator agent, a Teacher agent and a Scrutinizer
agent. In consideration of intrinsic consistency between informativeness
of the regions and their probability being ground-truth class, we design
a novel training paradigm, which enables Navigator to detect most informative regions under the guidance from Teacher. After that, the Scrutinizer scrutinizes the proposed regions from Navigator and makes predictions. Our model can be viewed as a multi-agent cooperation, wherein
agents benefit from each other, and make progress together. NTS-Net can
be trained end-to-end, while provides accurate fine-grained classification
predictions as well as highly informative regions during inference. We
achieve state-of-the-art performance in extensive benchmark datasets