Abstract. Weakly supervised methods usually generate localization results based on attention maps produced by classification networks. However, the attention maps exhibit the most discriminative parts of the object which are small and sparse. We propose to generate Self-produced
Guidance (SPG) masks which separate the foreground i.e. , the object
of interest, from the background to provide the classification networks
with spatial correlation information of pixels. A stagewise approach is
proposed to incorporate high confident object regions to learn the SPG
masks. The high confident regions within attention maps are utilized
to progressively learn the SPG masks. The masks are then used as an
auxiliary pixel-level supervision to facilitate the training of classification
networks. Extensive experiments on ILSVRC demonstrate that SPG is
effective in producing high-quality object localizations maps. Particularly, the proposed SPG achieves the Top-1 localization error rate of
43.83% on the ILSVRC validation set, which is a new state-of-the-art
error rate