SPFTN: A Self-Paced Fine-Tuning Network for Segmenting Objects
in Weakly Labelled Videos
Abstract
Object segmentation in weakly labelled videos is an interesting yet challenging task, which aims at learning to
perform category-specific video object segmentation by only using video-level tags. Existing works in this research
area might still have some limitations, e.g., lack of effective DNN-based learning frameworks, under-exploring the
context information, and requiring to leverage the unstable
negative video collection, which prevent them from obtaining more promising performance. To this end, we propose a
novel self-paced fine-tuning network (SPFTN)-based framework, which could learn to explore the context information
within the video frames and capture adequate object semantics without using the negative videos. To perform weakly
supervised learning based on the deep neural network, we
make the earliest effort to integrate the self-paced learning regime and the deep neural network into a unified and
compatible framework, leading to the self-paced fine-tuning
network. Comprehensive experiments on the large-scale
YouTube-Objects and DAVIS datasets demonstrate that the
proposed approach achieves superior performance as compared with other state-of-the-art methods as well as the
baseline networks and models.