Abstract. While large datasets have proven to be a key enabler for
progress in computer vision, they can have biases that lead to erroneous
conclusions. The notion of the representation bias of a dataset is proposed
to combat this problem. It captures the fact that representations other
than the ground-truth representation can achieve good performance on
any given dataset. When this is the case, the dataset is said not to be
well calibrated. Dataset calibration is shown to be a necessary condition
for the standard state-of-the-art evaluation practice to converge to the
ground-truth representation. A procedure, RESOUND, is proposed to
quantify and minimize representation bias. Its application to the problem of action recognition shows that current datasets are biased towards
static representations (objects, scenes and people). Two versions of RESOUND are studied. An Explicit RESOUND procedure is proposed to
assemble new datasets by sampling existing datasets. An implicit RESOUND procedure is used to guide the creation of a new dataset, Diving48, of over 18,000 video clips of competitive diving actions, spanning
48 fine-grained dive classes. Experimental evaluation confirms the effectiveness of RESOUND to reduce the static biases of current datasets