Abstract
Extreme classification seeks to assign each data
point, the most relevant labels from a universe of
a million or more labels. This task is faced with
the dual challenge of high precision and scalability, with millisecond level prediction times being
a benchmark. We propose DEFRAG, an adaptive
feature agglomeration technique to accelerate extreme classification algorithms. Despite past works
on feature clustering and selection, DEFRAG distinguishes itself in being able to scale to millions of
features, and is especially beneficial when feature
sets are sparse, which is typical of recommendation
and multi-label datasets. The method comes with
provable performance guarantees and performs ef-
ficient task-driven agglomeration to reduce feature
dimensionalities by an order of magnitude or more.
Experiments show that DEFRAG can not only reduce training and prediction times of several leading extreme classification algorithms by as much as
40%, but also be used for feature reconstruction to
address the problem of missing features, as well as
offer superior coverage on rare labels