Abstract
Learning fine-grained image similarity is a challengingtask. It needs to capture between-class and within-class image differences. This paper proposes a deep rankingmodel that employs deep learning techniques to learn sim-ilarity metric directly from images. It has higher learning capability than models based on hand-crafted features. Anovel multiscale network structure has been developed todescribe the images effectively. An efficient triplet sam-pling algorithm is proposed to learn the model with dis-tributed asynchronized stochastic gradient. Extensive experiments show that the proposed algorithm outperforms models based on hand-crafted visual features and deep classification models.