Abstract
In multi-label classification, each sample can be associated with a set of class labels. When the number of labels grows to the hundreds or even thousands, existing multi-label classification methods often become computationally inefficient. In recent years, a number of remedies have been proposed. However, they are based either on simple dimension reduction techniques or involve expensive optimization problems. In this paper, we address this problem by selecting a small subset of class labels that can approximately span the original label space. This is performed by an efficient randomized sampling procedure where the sampling probability of each class label reflects its importance among all the labels. Experiments on a number of realworld multi-label data sets with many labels demonstrate the appealing performance and efficiency of the proposed algorithm.