Abstract
There have been remarkable improvements in the semantic labelling task in the recent years. However, the
state of the art methods rely on large-scale pixel-level annotations. This paper studies the problem of training a
pixel-wise semantic labeller network from image-level annotations of the present object classes. Recently, it has
been shown that high quality seeds indicating discriminative object regions can be obtained from image-level labels. Without additional information, obtaining the full extent of the object is an inherently ill-posed problem due to
co-occurrences. We propose using a saliency model as additional information and hereby exploit prior knowledge on
the object extent and image statistics. We show how to combine both information sources in order to recover 80% of
the fully supervised performance – which is the new state
of the art in weakly supervised training for pixel-wise semantic labelling