Abstract
Many real-world applications involve multilabel classification, in which the labels are organized in the form of a tree or directed acyclic graph (DAG). However, current research efforts typically ignore the label dependencies or can only exploit the dependencies in tree-structured hierarchies. In this paper, we present a novel hierarchical multilabel classification algorithm which can be used on both treeand DAG-structured hierarchies. The key idea is to formulate the search for the optimal consistent multi-label as the finding of the best subgraph in a tree/DAG. Using a simple greedy strategy, the proposed algorithm is computationally efficient, easy to implement, does not suffer from the problem of insufficient/skewed training data in classifier training, and can be readily used on large hierarchies. Theoretical results guarantee the optimality of the obtained solution. Experiments are performed on a large number of functional genomics data sets. The proposed method consistently outperforms the state-of-the-art method on both treeand DAG-structured hierarchies.