CleanNet: Transfer Learning for Scalable Image Classifier Trainingwith Label Noise
Abstract
In this paper, we study the problem of learning image
classification models with label noise. Existing approaches
depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is
time-consuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce
the amount of human supervision for label noise cleaning,
we introduce CleanNet, a joint neural embedding network,
which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that
can be transferred to other classes. We further integrate
CleanNet and conventional convolutional neural network
classifier into one framework for image classification learning. We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the
image classification on noisy data task on several largescale datasets. Experimental results show that CleanNet
can reduce label noise detection error rate on held-out
classes where no human supervision available by 41.5%
compared to current weakly supervised methods. It also
achieves 47% of the performance gain of verifying all images with only 3.2% images verified on an image classifi-
cation task. Source code and dataset will be available at
kuanghuei.github.io/CleanNetProject