Abstract
Tail label data (TLD) is prevalent in real-world tasks,
and large-scale multi-label learning (LMLL) is its
major learning scheme. Previous LMLL studies
typically need to additionally take into account extensive head label data (HLD), and thus fail to guide
the learning behavior of TLD. In many applications
such as recommender systems, however, the prediction of tail label is very necessary, since it provides
very important supplementary information. We call
this kind of problem as tail label learning. In this
paper, we propose a novel method for the tail label
learning problem. Based on the observation that
the raw feature representation in LMLL data usually benefits HLD, which may not be suitable for
TLD, we construct effective and rich label-specific
features through exploring labeled data distribution
and leveraging label correlations. Specifically, we
employ clustering analysis to explore discriminative features for each tail label replacing the original
high-dimensional and sparse features. In addition,
due to the scarcity of positive examples of TLD, we
encode knowledge from HLD by exploiting label
correlations to enhance the label-specific features.
Experimental results verify the superiority of the
proposed method in terms of performance on TLD