Abstract
We introduce the first dataset for human edits
of machine-generated visual stories and explore how these collected edits may be used
for the visual story post-editing task. The
dataset, VIST-Edit1
, includes 14,905 humanedited versions of 2,981 machine-generated
visual stories. The stories were generated by
two state-of-the-art visual storytelling models,
each aligned to 5 human-edited versions. We
establish baselines for the task, showing how
a relatively small set of human edits can be
leveraged to boost the performance of large
visual storytelling models. We also discuss
the weak correlation between automatic evaluation scores and human ratings, motivating the
need for new automatic metrics.