Determining Expert Research Areas with Multi-Instance
Learning of Hierarchical Multi-Label Classification Model
Abstract
Automatically identifying the research areas of academic/industry researchers is an important task for building expertise organizations or search systems. In general, this task can be viewed as text classification that generates a set of research areas given the expertise of a researcher like documents of publications. However, this task is challenging because the evidence of a research area may only exist in a few documents instead of all documents. Moreover, the research areas are often organized in a hierarchy, which limits the effectiveness of existing text categorization methods. This paper proposes a novel approach, Multi-instance Learning of Hierarchical Multi-label Classification Model (MIHML) for the task, which effectively identifies multiple research areas in a hierarchy from individual documents within the profile of a researcher. An ExpectationMaximization (EM) optimization algorithm is designed to learn the model parameters. Extensive experiments have been conducted to demonstrate the superior performance of proposed research with a real world application.