Abstract In this paper, we propose a semi-supervised approach for representative selection, which fifinds a small set of representatives that can well summarize a large data collection. Given labeled source data and big unlabeled target data, we aim to fifind representatives in the target data, which can not only represent and associate data points belonging to each labeled category, but also discover novel categories in the target data, if any. To leverage labeled source data, we guide representative selection from labeled source to unlabeled target. We propose a joint optimization framework which alternately optimizes (1) representative selection in the target data and (2) discriminative feature learning from both the source and the target for better representative selection. Experiments on image and video datasets demonstrate that our proposed approach not only fifinds better representatives, but also can discover novel categories in the target data that are not in the source