Abstract
Online video object segmentation is a challenging task
as it entails to process the image sequence timely and accurately. To segment a target object through the video, numerous CNN-based methods have been developed by heavily
finetuning on the object mask in the first frame, which is
time-consuming for online applications. In this paper, we
propose a fast and accurate video object segmentation algorithm that can immediately start the segmentation process once receiving the images. We first utilize a partbased tracking method to deal with challenging factors such
as large deformation, occlusion, and cluttered background.
Based on the tracked bounding boxes of parts, we construct a region-of-interest segmentation network to generate
part masks. Finally, a similarity-based scoring function is
adopted to refine these object parts by comparing them to
the visual information in the first frame. Our method performs favorably against state-of-the-art algorithms in accuracy on the DAVIS benchmark dataset, while achieving
much faster runtime performance