Abstract. This paper presents a weakly-supervised approach to object
instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in
an adversarial learning setup. A mask generator takes a detection box
and Faster R-CNN features, and constructs a segmentation mask that is
used to cut-and-paste the object into a new image location. The discriminator tries to distinguish between real objects, and those cut and pasted
via the generator, giving a learning signal that leads to improved object
masks. We verify our method experimentally using Cityscapes, COCO,
and aerial image datasets, learning to segment objects without ever having seen a mask in training. Our method exceeds the performance of
existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches 90% of supervised performance