Obtaining High-Quality Label by Distinguishing between
Easy and Hard Items in Crowdsourcing
Abstract
Crowdsourcing systems make it possible to hire
voluntary workers to label large-scale data by offering them small monetary payments. Usually, the
taskmaster requires to collect high-quality labels,
while the quality of labels obtained from the crowd
may not satisfy this requirement. In this paper, we
study the problem of obtaining high-quality labels
from the crowd and present an approach of learning
the difficulty of items in crowdsourcing, in which
we construct a small training set of items with estimated difficulty and then learn a model to predict the difficulty of future items. With the predicted difficulty, we can distinguish between easy and
hard items to obtain high-quality labels. For easy
items, the quality of their labels inferred from the
crowd could be high enough to satisfy the requirement; while for hard items, the crowd could not
provide high-quality labels, it is better to choose
a more knowledgable crowd or employ specialized
workers to label them. The experimental results
demonstrate that the proposed approach by learning to distinguish between easy and hard items can
significantly improve the label quality