Abstract
In computer vision, selection of the most informative
samples from a huge pool of training data in order to learn
a good recognition model is an active research problem.
Furthermore, it is also useful to reduce the annotation cost,
as it is time consuming to annotate unlabeled samples. In
this paper, motivated by the theories in data compression,
we propose a novel sample selection strategy which exploits the concept of typicality from the domain of information theory. Typicality is a simple and powerful technique which can be applied to compress the training data to
learn a good classification model. In this work, typicality
is used to identify a subset of the most informative samples
for labeling, which is then used to update the model using
active learning. The proposed model can take advantage
of the inter-relationships between data samples. Our approach leads to a significant reduction of manual labeling
cost while achieving similar or better recognition performance compared to a model trained with entire training set.
This is demonstrated through rigorous experimentation on
five datasets.