Spatiotemporal CNN for Video Object Segmentation

资源分类

2019-09-11 |

82 |

43 |

Abstract In this paper, we present a unifified, end-to-end trainable spatiotemporal CNN model for VOS, which consists of two branches, i.e., the temporal coherence branch and the spatial segmentation branch. Specififically, the temporal coherence branch pretrained in an adversarial fashion from unlabeled video data, is designed to capture the dynamic appearance and motion cues of video sequences to guide object segmentation. The spatial segmentation branch focuses on segmenting objects accurately based on the learned appearance and motion cues. To obtain accurate segmentation results, we design a coarse-to-fifine process to sequentially apply a designed attention module on multi-scale feature maps, and concatenate them to produce the fifinal prediction. In this way, the spatial segmentation branch is enforced to gradually concentrate on object regions. These two branches are jointly fifine-tuned on video segmentation sequences in an end-to-end manner. Several experiments are carried out on three challenging datasets (i.e., DAVIS- 2016, DAVIS-2017 and Youtube-Object) to show that our method achieves favorable performance against the stateof-the-arts. Code is available at https://github. com/longyin880815/STCNN.

上一篇：MHP-VOS: Multiple Hypotheses Propagation for Video Object Segmentation

下一篇：Cross-Modal Self-Attention Network for Referring Image Segmentation

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com