Multi-Task Deep Visual-Semantic Embedding for Video Thumbnail Selection

资源分类

2019-12-25 |

65 |

36 |

Abstract

Given the tremendous growth of online videos, videothumbnail, as the common visualization form of video content, is becoming increasingly important to influence user’sbrowsing and searching experience. However, conventionalmethods for video thumbnail selection often fail to produce satisfying results as they ignore the side semantic informa-tion (e.g., title, description, and query) associated with thevideo. As a result, the selected thumbnail cannot always represent video semantics and the click-through rate is ad-versely affected even when the retrieved videos are relevant. In this paper, we have developed a multi-task deep visualsemantic embedding model, which can automatically select query-dependent video thumbnails according to both visual and side information. Different from most existing methods, the proposed approach employs the deep visual-semantic embedding model to directly compute the similarity betweenthe query and video thumbnails by mapping them into acommon latent semantic space, where even unseen querythumbnail pairs can be correctly matched. In particular, we train the embedding model by exploring the large-scale and freely accessible click-through video and image data, as well as employing a multi-task learning strategy to holistically exploit the query-thumbnail relevance from these two highly related datasets. Finally, a thumbnail is selected by fusing both the representative and query relevance scores. The evaluations on 1,000 query-thumbnail dataset labeled by 191 workers in Amazon Mechanical Turk have demonstrated the effectiveness of our proposed method.

上一篇：Video Anomaly Detection and Localization Using Hierarchical Feature Representation and Gaussian Process Regression

下一篇：Video Summarization by Learning Submodular Mixtures of Objectives

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Joint Pose and Ex...

Facial expression recognition (FER) is a challe...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com