Abstract
Deep learning methods have become the de-facto standard for challenging image processing tasks such as image classification. One major hurdle of deep learning approaches is that large sets of labeled data are necessary,
which can be prohibitively costly to obtain, particularly
in medical image diagnosis applications. Active learning
techniques can alleviate this labeling effort. In this paper we investigate some recently proposed methods for active learning with high-dimensional data and convolutional
neural network classifiers. We compare ensemble-based
methods against Monte-Carlo Dropout and geometric approaches. We find that ensembles perform better and lead to
more calibrated predictive uncertainties, which are the basis for many active learning algorithms. To investigate why
Monte-Carlo Dropout uncertainties perform worse, we explore potential differences in isolation in a series of experiments. We show results for MNIST and CIFAR-10, on which
we achieve a test set accuracy of 90% with roughly 12,200
labeled images, and initial results on ImageNet. Additionally, we show results on a large, highly class-imbalanced
diabetic retinopathy dataset. We observe that the ensemblebased active learning effectively counteracts this imbalance
during acquisition