Abstract
Obtaining labels can be expensive or timeconsuming, but unlabeled data is often abundant and easier to obtain. Most learning tasks can be made more efficient, in terms of labeling cost, by intelligently choosing specific unlabeled instances to be labeled by an oracle. The general problem of optimally choosing these instances is known as active learning. As it is usually set in the context of supervised learning, active learning relies on a single oracle playing the role of a teacher. We focus on the multiple annotator scenario where an oracle, who knows the ground truth, no longer exists; instead, multiple labelers, with varying expertise, are available for querying. This paradigm posits new challenges to the active learning scenario. We can now ask which data sample should be labeled next and which annotator should be queried to benefit our learning model the most. In this paper, we employ a probabilistic model for learning from multiple annotators that can also learn the annotator expertise even when their expertise may not be consistently accurate across the task domain. We then focus on providing a criterion and formulation that allows us to select both a sample and the annotator/s to query the labels from.