Semantic Aware Video Transcription Using Random Forest Classifiers

资源分类

2020-04-06 |

59 |

48 |

Abstract

This paper focuses on transcription generation in the form of sub ject, verb, ob ject (SVO) triplets for videos in the wild, given off- the-shelf visual concept detectors. This problem is challenging due to the availability of sentence only annotations, the unreliability of con- cept detectors, and the lack of training samples for many words. Fac- ing these challenges, we propose a Semantic Aware Transcription (SAT) framework based on Random Forest classifiers. It takes concept detec- tion results as input, and outputs a distribution of English words. SAT uses video, sentence pairs for training. It hierarchically learns node splits by grouping semantically similar words, measured by a continuous skip- gram language model. This not only addresses the sparsity of training samples per word, but also yields semantically reasonable errors during transcription. SAT provides a systematic way to measure the related- ness of a concept detector to real words, which helps us understand the relationship between current visual detectors and words in a semantic space. Experiments on a large video dataset with 1,970 clips and 85,550 sentences are used to demonstrate our idea.

上一篇：Latent-Class Hough Forests for 3D Object Detection and Pose Estimation

下一篇：Geometry Driven Semantic Labeling of Indoor Scenes

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Learning to learn...

The move from hand-designed features to learned...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com