Abstract. Temporal action proposal generation is an important task,
akin to object proposals, temporal action proposals are intended to capture “clips” or temporal intervals in videos that are likely to contain an
action. Previous methods can be divided to two groups: sliding window
ranking and actionness score grouping. Sliding windows uniformly cover
all segments in videos, but the temporal boundaries are imprecise; grouping based method may have more precise boundaries but it may omit
some proposals when the quality of actionness score is low. Based on the
complementary characteristics of these two methods, we propose a novel
Complementary Temporal Action Proposal (CTAP) generator. Specifi-
cally, we apply a Proposal-level Actionness Trustworthiness Estimator
(PATE) on the sliding windows proposals to generate the probabilities
indicating whether the actions can be correctly detected by actionness
scores, the windows with high scores are collected. The collected sliding windows and actionness proposals are then processed by a temporal
convolutional neural network for proposal ranking and boundary adjustment. CTAP outperforms state-of-the-art methods on average recall
(AR) by a large margin on THUMOS-14 and ActivityNet 1.3 datasets.
We further apply CTAP as a proposal generation method in an existing
action detector, and show consistent significant improvements