Abstract
The speed with which intelligent systems can react to an ac- tion depends on how soon it can be recognized. The ability to recognize ongoing actions is critical in many applications, for example, spotting criminal activity. It is challenging, since decisions have to be made based on partial videos of temporally incomplete action executions. In this pa- per, we propose a novel discriminative multi-scale model for predicting the action class from a partially observed video. The proposed model captures temporal dynamics of human actions by explicitly considering all the history of observed features as well as features in smaller tem- poral segments. We develop a new learning formulation, which elegantly captures the temporal evolution over time, and enforces the label consis- tency between segments and corresponding partial videos. Experimental results on two public datasets show that the proposed approach outper- forms state-of-the-art action prediction methods.