Abstract
Large-scale datasets possessing clean label annotations
are crucial for training Convolutional Neural Networks
(CNNs). However, labeling large-scale data can be very
costly and error-prone, and even high-quality datasets are
likely to contain noisy (incorrect) labels. Existing works
usually employ a closed-set assumption, whereby the samples associated with noisy labels possess a true class contained within the set of known classes in the training data.
However, such an assumption is too restrictive for many applications, since samples associated with noisy labels might
in fact possess a true class that is not present in the training
data. We refer to this more complex scenario as the open-set
noisy label problem and show that it is nontrivial in order
to make accurate predictions. To address this problem, we
propose a novel iterative learning framework for training
CNNs on datasets with open-set noisy labels. Our approach
detects noisy labels and learns deep discriminative features
in an iterative fashion. To benefit from the noisy label detection, we design a Siamese network to encourage clean
labels and noisy labels to be dissimilar. A reweighting module is also applied to simultaneously emphasize the learning
from clean labels and reduce the effect caused by noisy labels. Experiments on CIFAR-10, ImageNet and real-world
noisy (web-search) datasets demonstrate that our proposed
model can robustly train CNNs in the presence of a high
proportion of open-set as well as closed-set noisy labels