Abstract
We propose a weakly supervised semantic segmentation
algorithm that uses image tags for supervision. We apply the tags in queries to collect three sets of web images,
which encode the clean foregrounds, the common backgrounds, and realistic scenes of the classes. We introduce
a novel three-stage training pipeline to progressively learn
semantic segmentation models. We first train and refine a
class-specific shallow neural network to obtain segmentation masks for each class. The shallow neural networks of
all classes are then assembled into one deep convolutional
neural network for end-to-end training and testing. Experiments show that our method notably outperforms previous
state-of-the-art weakly supervised semantic segmentation
approaches on the PASCAL VOC 2012 segmentation benchmark. We further apply the class-specific shallow neural
networks to object segmentation and obtain excellent results