Movie/Script: Alignment and Parsing of Video and Text Transcription

资源分类

2020-03-30 |

69 |

31 |

Abstract

Movies and TV are a rich source of diverse and complex video of peo- ple, objects, actions and locales “in the wild”. Harvesting automatically labeled sequences of actions from video would enable creation of large-scale and highly- varied datasets. To enable such collection, we focus on the task of recovering scene structure in movies and TV series for object tracking and action retrieval. We present a weakly supervised algorithm that uses the screenplay and closed captions to parse a movie into a hierarchy of shots and scenes. Scene boundaries in the movie are aligned with screenplay scene labels and shots are reordered into a sequence of long continuous tracks or threads which allow for more ac- curate tracking of people, actions and objects. Scene segmentation, alignment, and shot threading are formulated as inference in a uni fied generative model and a novel hierarchical dynamic programming algorithm that can handle alignment and jump-limited reorderings in linear time is presented. We present quantitative and qualitative results on movie alignment and parsing, and use the recovered structure to improve character naming and retrieval of common actions in several episodes of popular TV series.

上一篇：A Column-Pivoting Based Strategy for Monomial Ordering in Numerical Gr?bner Basis Calculations

下一篇：An Extended Phase Field Higher-Order Active Contour Model for Networks and Its Application to Road Network Extraction from VHR Satellite Images

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com