Abstract
Multi-Label Hierarchical Text Classification
(MLHTC) is the task of categorizing documents into one or more topics organized in an
hierarchical taxonomy. MLHTC can be formulated by combining multiple binary classifi-
cation problems with an independent classifier
for each category. We propose a novel transfer learning based strategy, HTrans, where binary classifiers at lower levels in the hierarchy are initialized using parameters of the
parent classifier and fine-tuned on the child
category classification task. In HTrans, we
use a Gated Recurrent Unit (GRU)-based deep
learning architecture coupled with attention.
Compared to binary classifiers trained from
scratch, our HTrans approach results in signifi-
cant improvements of 1% on micro-F1 and 3%
on macro-F1 on the RCV1 dataset. Our experiments also show that binary classifiers trained
from scratch are significantly better than single
multi-label models