Abstract Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemes are typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fifine-grained object retrieval scheme that conquers these issues in a unifified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very effificient (1,000 times training speedup comparing to the triplet loss) and discriminative feature learning by a “centralized” global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features “within” the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance of the proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We have reported signifificant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017] on CARS196, and 3.7% on CUB200-2011.