Unsupervised Alignment of Actions in Video with Text Descriptions

资源分类

2019-11-27 |

71 |

49 |

Abstract Advances in video technology and data storage have made large scale video data collections of complex activities readily accessible. An increasingly popular approach for automatically inferring the details of a video is to associate the spatiotemporal segments in a video with its natural language descriptions. Most algorithms for connecting natural language with video rely on pre-aligned supervised training data. Recently, several models have been shown to be effective for unsupervised alignment of objects in video with language. However, it remains diffificult to generate good spatiotemporal video segments for actions that align well with language. This paper presents a framework that extracts higher level representations of lowlevel action features through hyperfeature coding from video and aligns them with language. We propose a two-step process that creates a highlevel action feature codebook with temporally consistent motions, and then applies an unsupervised alignment algorithm over the action codewords and verbs in the language to identify individual activities. We show an improvement over previous alignment models of objects and nouns on videos of biological experiments, and also evaluate our system on a larger scale collection of videos involving kitchen activities

上一篇：Weakly-Supervised Deep Learning for Customer Review Sentiment Classification

下一篇：Multi-View Exclusive Unsupervised Dimension Reduction for Video-Based Facial Expression Recognition

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com