Abstract
This paper addresses the well-established problem of un- supervised ob ject discovery with a novel method inspired by weakly- supervised approaches. In particular, the ability of an ob ject patch to predict the rest of the ob ject (its context) is used as supervisory signal to help discover visually consistent ob ject clusters. The main contribu- tions of this work are: 1) framing unsupervised clustering as a leave- one-out context prediction task; 2) evaluating the quality of context prediction by statistical hypothesis testing between thing and stuff ap- pearance models; and 3) an iterative region prediction and context align- ment approach that gradually discovers a visual ob ject cluster together with a segmentation mask and fine-grained correspondences. The proposed method outperforms previous unsupervised as well as weakly- supervised ob ject discovery approaches, and is shown to provide corre- spondences detailed enough to transfer keypoint annotations.