Action Search: Spotting Actions in Videos and
Its Application to Temporal Action Localization
Abstract. State-of-the-art temporal action detectors inefficiently search
the entire video for specific actions. Despite the encouraging progress
these methods achieve, it is crucial to design automated approaches that
only explore parts of the video which are the most relevant to the actions
being searched for. To address this need, we propose the new problem
of action spotting in video, which we define as finding a specific action
in a video while observing a small portion of that video. Inspired by the
observation that humans are extremely efficient and accurate in spotting
and finding action instances in video, we propose Action Search, a novel
Recurrent Neural Network approach that mimics the way humans spot
actions. Moreover, to address the absence of data recording the behavior
of human annotators, we put forward the Human Searches dataset, which
compiles the search sequences employed by human annotators spotting
actions in the AVA and THUMOS14 datasets. We consider temporal
action localization as an application of the action spotting problem. Experiments on the THUMOS14 dataset reveal that our model is not only
able to explore the video efficiently (observing on average 17.3% of the
video) but it also accurately finds human activities with 30.8% mAP