Abstract
We address the problem of joint detection and segmenta-tion of multiple object instances in an image, a key step to-wards scene understanding. Inspired by data-driven meth-ods, we propose an exemplar-based approach to the taskof instance segmentation, in which a set of reference im-age/shape masks is used to find multiple objects. We designa novel CRF framework that jointly models object appear-ance, shape deformation, and object occlusion. To tacklethe challenging MAP inference problem, we derive an alter-nating procedure that interleaves object segmentation and shape/appearance adaptation. We evaluate our method on two datasets with instance labels and show promising results.