Abstract
We present a theoretically grounded approach to train
deep neural networks, including recurrent networks, subject
to class-dependent label noise. We propose two procedures
for loss correction that are agnostic to both application domain and network architecture. They simply amount to at
most a matrix inversion and multiplication, provided that
we know the probability of each class being corrupted into
another. We further show how one can estimate these probabilities, adapting a recent technique for noise estimation
to the multi-class setting, and thus providing an end-to-end
framework. Extensive experiments on MNIST, IMDB, CIFAR-
10, CIFAR-100 and a large scale dataset of clothing images
employing a diversity of architectures — stacking dense,
convolutional, pooling, dropout, batch normalization, word
embedding, LSTM and residual layers — demonstrate the
noise robustness of our proposals. Incidentally, we also
prove that, when ReLU is the only non-linearity, the loss
curvature is immune to class-dependent label noise