Abstract
Low dimensional embeddings that capture the main variations of interest in collections of data are important for
many applications. One way to construct these embeddings
is to acquire estimates of similarity from the crowd. Similarity is a multi-dimensional concept that varies from individual to individual. However, existing models for learning
crowd embeddings typically make simplifying assumptions
such as all individuals estimate similarity using the same
criteria, the list of criteria is known in advance, or that the
crowd workers are not influenced by the data that they see.
To overcome these limitations we introduce Context Embedding Networks (CENs). In addition to learning interpretable embeddings from images, CENs also model worker
biases for different attributes along with the visual context i.e. the attributes highlighted by a set of images. Experiments on three noisy crowd annotated datasets show
that modeling both worker bias and visual context results
in more interpretable embeddings compared to existing approaches