Abstract
Positive and Unlabeled (PU) learning aims to learn
a binary classifier from only positive and unlabeled
training data. The state-of-the-art methods usually formulate PU learning as a cost-sensitive learning problem, in which every unlabeled example
is simultaneously treated as positive and negative
with different class weights. However, the groundtruth label of an unlabeled example should be unique, so the existing models inadvertently introduce the label noise which may lead to the biased
classifier and deteriorated performance. To solve
this problem, this paper proposes a novel algorithm dubbed as “Positive and Unlabeled learning with
Label Disambiguation” (PULD). We first regard all
the unlabeled examples in PU learning as ambiguously labeled as positive and negative, and then employ the margin-based label disambiguation strategy, which enlarges the margin of classifier response
between the most likely label and the less likely
one, to find the unique ground-truth label of each
unlabeled example. Theoretically, we derive the
generalization error bound of the proposed method
by analyzing its Rademacher complexity. Experimentally, we conduct intensive experiments on both
benchmark and real-world datasets, and the results
clearly demonstrate the superiority of the proposed
PULD to the existing PU learning approaches