Abstract
In many machine learning applications, crowdsourcing has become the primary means for label collection. In this paper, we study the optimal error rate for aggregating labels provided by a set of non-expert workers. Under the classic Dawid-Skene model, we establish matching upper and lower bounds with an exact exponent mI(π) in which m is the number of workers and I(π) the average Chernoff information that characterizes the workers’ collective ability. Such an exact characterization of the error exponent allows us to state a precise sample size requirement in order to achieve an misclas sification error. In addition, our results imply th optimality of various EM algorithms for crowdsourcing initialized by consistent estimators.