Abstract
We seek to understand the arrow of time in videos – what
makes videos look like they are playing forwards or backwards? Can we visualize the cues? Can the arrow of time
be a supervisory signal useful for activity analysis? To this
end, we build three large-scale video datasets and apply a
learning-based approach to these tasks.
To learn the arrow of time efficiently and reliably, we design a ConvNet suitable for extended temporal footprints
and for class activation visualization, and study the effect of artificial cues, such as cinematographic conventions, on learning. Our trained model achieves state-of-theart performance on large-scale real-world video datasets.
Through cluster analysis and localization of important regions for the prediction, we examine learned visual cues
that are consistent among many samples and show when
and where they occur. Lastly, we use the trained ConvNet
for two applications: self-supervision for action recognition, and video forensics – determining whether Hollywood
film clips have been deliberately reversed in time, often used
as special effects