Abstract
In this study, we present a weakly supervised approachthat discovers the discriminative structures of sketch im-ages, given pairs of sketch images and web images. Incontrast to traditional approaches that use global appear-ance features or relay on keypoint features, our aim is toautomatically learn the shared latent structures that existbetween sketch images and real images, even when there are significant appearance differences across its relevant real images. To accomplish this, we propose a deep convolutional neural network, named SketchNet. We firstly develop a triplet composed of sketch, positive and negative real image as the input of our neural network. To discover the coherent visual structures between the sketch and its positive pairs, we introduce the softmax as the loss func-tion. Then a ranking mechanism is introduced to make thepositive pairs obtain a higher score comparing over negative ones to achieve robust representation. Finally, we formalize above-mentioned constrains into the unified objective function, and create an ensemble feature representation to describe the sketch images. Experiments on the TUBerlin sketch benchmark demonstrate the effectiveness of our model and show that deep feature representation brings substantial improvements over other state-of-the-art methods on sketch classification.