Multi-modal Circulant Fusion for Video-to-Language and Backward Aming Wu and Yahong Han

资源分类

2019-11-06 |

66 |

45 |

Abstract Multi-modal fusion has been widely involved in focuses on the modern artificial intelligence research, e.g., from visual content to languages and backward. Common-used multi-modal fusion methods mainly include element-wise product, elementwise sum, or even simply concatenation between different types of features, which are somewhat straightforward but lack in-depth analysis. Recent studies have shown fully exploiting interactions among elements of multi-modal features will lead to a further performance gain. In this paper, we put forward a new approach of multi-modal fusion, namely Multi-modal Circulant Fusion (MCF). Particularly, after reshaping feature vectors into circulant matrices, we define two types of interaction operations between vectors and matrices. As each row of the circulant matrix shifts one elements, with newly-defined interaction operations, we almost explore all possible interactions between vectors of different modalities. Moreover, as only regular operations are involved and defined a priori, MCF avoids increasing parameters or computational costs for multi-modal fusion. We evaluate MCF with tasks of video captioning and temporal activity localization via language (TALL). Experiments on MSVD and MSRVTT show our method obtains the state-of-the-art performance for video captioning. For TALL, by plugging into MCF, we achieve a performance gain of roughly 4.2% on TACoS.

上一篇：Do not Lose the Details: Reinforced Representation Learning for High Performance Visual Tracking

下一篇：Evaluating Brush Movements for Chinese Calligraphy: A Computer Vision Based Approach

用户评价

全部评价

还没有评论，说两句吧！

热门资源

The Variational S...

Unlike traditional images which do not offer in...
Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com