Abstract
We present cmn approach to matching images of objects in fine-grained datasets without using part annotations,with cn application to the challenging problem of weakly supervised single-view reconstruction.This is in contrast to prior works that require part annotations,since matdhing objects across class and pose variations is challenging with qppearance features alone.We overcome this challenge through a novel deep leaming architecture,WarpNet, that aligns an object in one image with a diferent object in another.We exploit the structure of the fine-grained dataset to create artificial data for training this network in an unsupervised-discriminative leaming approach.The output of the network acts as a spa-tial prior that allows generalization at test time to match real images across variations in appearance,viewpoint and ar-ticulation.On the CUB-200-2011 dataset of bind categories,we improve the AP over an appearance-only network by 13.6%.We further demonstrate that our WarpNet matches,together with the stnuucture of fine-grained datasets,allow single-view reconstructions with quality comparable to using annotated point correspondences.