资源论文FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos

FusionSeg: Learning to combine motion and appearance for fully automatic segmentation of generic objects in videos

2019-12-09 | |  71 |   47 |   0
Abstract We propose an end-to-end learning framework for segmenting generic objects in videos. Our method learns to combine appearance and motion information to produce pixel level segmentation masks for all prominent objects. We formulate the task as a structured prediction problem and design a two-stream fully convolutional neural network which fuses together motion and appearance in a unified framework. Since large-scale video datasets with pixel level segmentations are lacking, we show how to bootstrap weakly annotated videos together with existing image recognition datasets for training. Through experiments on three challenging video segmentation benchmarks, our method substantially improves the state-of-the-art results for segmenting generic (unseen) objects. Code and pretrained models are available on the project website

上一篇:Flexible Spatio-Temporal Networks for Video Prediction

下一篇:Hierarchical Boundary-Aware Neural Encoder for Video Captioning

用户评价
全部评价

热门资源

  • The Variational S...

    Unlike traditional images which do not offer in...

  • Stratified Strate...

    In this paper we introduce Stratified Strategy ...

  • Learning to learn...

    The move from hand-designed features to learned...

  • A Mathematical Mo...

    Direct democracy, where each voter casts one vo...

  • Learning to Predi...

    Much of model-based reinforcement learning invo...