Abstract
The problem of predicting image or video interestingness from their low-level feature representations has received increasing inter- est. As a highly sub jective visual attribute, annotating the interesting- ness value of training data for learning a prediction model is challenging. To make the annotation less sub jective and more reliable, recent studies employ crowdsourcing tools to collect pairwise comparisons – relying on ma jority voting to prune the annotation outliers/errors. In this paper, we propose a more principled way to identify annotation outliers by for- mulating the interestingness prediction task as a unified robust learning to rank problem, tackling both the outlier detection and interestingness prediction tasks jointly. Extensive experiments on both image and video interestingness benchmark datasets demonstrate that our new approach significantly outperforms state-of-the-art alternatives.