Abstract
We investigate the feature design and classification ar-chitectures in temporal action localization. This applica-tion focuses on detecting and labeling actions in untrimmedvideos, which brings more challenge than classifying pre-segmented videos. The major difficulty for action localiza-tion is the uncertainty of action occurrence and utilizationof information from different scales. Two innovations areproposed to address this issue. First, we propose a Pyra-mid of Score Distribution Feature (PSDF) to capture themotion information at multiple resolutions centered at eachdetection window. This novel feature mitigates the influ-ence of unknown action position and duration, and shows significant performance gain over previous detection approaches. Second, inter-frame consistency is further explored by incorporating PSDF into the state-of-the-art Recurrent Neural Networks, which gives additional performance gain in detecting actions in temporally untrimmedvideos. We tested our action localization framework on the THUMOS’15 and MPII Cooking Activities Dataset, both of which show a large performance improvement over previous attempts.