Abstract
Recent work establishes dataset difficulty and
removes annotation artifacts via partial-input
baselines (e.g., hypothesis-only models for
SNLI or question-only models for VQA). When
a partial-input baseline gets high accuracy, a
dataset is cheatable. However, the converse
is not necessarily true: the failure of a partialinput baseline does not mean a dataset is free
of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns
in the full input that are undetectable by any
partial-input model. Next, we identify such artifacts in the SNLI dataset—a hypothesis-only
model augmented with trivial patterns in the
premise can solve 15% of the examples that
are previously considered “hard”. Our work
provides a caveat for the use of partial-input
baselines for dataset verification and creation