Abstract
Despite the recent advances in large-scale video analysis, action detection remains as one of the most challenging unsolved problems in computer vision. This snag is in
part due to the large volume of data that needs to be analyzed to detect actions in videos. Existing approaches have
mitigated the computational cost, but still, these methods
lack rich high-level semantics that helps them to localize
the actions quickly. In this paper, we introduce a Semantic Cascade Context (SCC) model that aims to detect action in long video sequences. By embracing semantic priors associated with human activities, SCC produces highquality class-specific action proposals and prune unrelated
activities in a cascade fashion. Experimental results in ActivityNet unveils that SCC achieves state-of-the-art performance for action detection while operating at real time.