Abstract
Model selection is treated as a standard performance boosting step in many machine learning applications. Once all other properties of a learning
problem are fixed, the model is selected by grid
search on a held-out validation set. This is strictly
inapplicable to active learning. Within the standardized workflow, the acquisition function is chosen among available heuristics a priori, and its success is observed only after the labeling budget is
already exhausted. More importantly, none of the
earlier studies report a unique consistently successful acquisition heuristic to the extent to stand out
as the unique best choice. We present a method
to break this vicious circle by defining the acquisition function as a learning predictor and training it
by reinforcement feedback collected from each labeling round. As active learning is a scarce data
regime, we bootstrap from a well-known heuristic that filters the bulk of data points on which all
heuristics would agree, and learn a policy to warp
the top portion of this ranking in the most beneficial
way for the character of a specific data distribution.
Our system consists of a Bayesian neural net, the
predictor, a bootstrap acquisition function, a probabilistic state definition, and another Bayesian policy network that can effectively incorporate this input distribution. We observe on three benchmark
data sets that our method always manages to either invent a new superior acquisition function or to
adapt itself to the a priori unknown best performing
heuristic for each specific data set