Aggregating Image and Text Quantized Correlated Components

资源分类

2019-12-26 |

62 |

42 |

Abstract

Cross-modal tasks occur naturally for multimedia content that can be described along two or more modalities like visual content and text. Such tasks require to “translate” information from one modality to another. Methods like kernelized canonical correlation analysis (KCCA) attempt tosolve such tasks by finding aligned subspaces in the description spaces of different modalities. Since they favor correla-tions against modality-specific information, these methodshave shown some success in both cross-modal and bi-modaltasks. However, we show that a direct use of the subspacealignment obtained by KCCA only leads to coarse trans-lation abilities. To address this problem, we first put for-ward a new representation method that aggregates informa-tion provided by the projections of both modalities on theiraligned subspaces. We further suggest a method relying on neighborhoods in these subspaces to complete uni-modalinformation. Our proposal exhibits state-of-the-art results for bi-modal classification on Pascal VOC07 and improves it by over 60% for cross-modal retrieval on FlickR 8K/30K.

上一篇：Scale-Aware Alignment of Hierarchical Image Segmentation

下一篇：Joint Recovery of Dense Correspondence and Cosegmentation in Two Images

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...
Rating-Boosted La...

The performance of a recommendation system reli...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com