Abstract
One of the key challenges in human action recognition from video sequences is how to model an action sufficiently. Therefore, in this paper we propose a novel motion-based representation called Motion Context (MC), which is insensitive to the scale and direction of an ac- tion, by employing image representation techniques. A MC captures the distribution of the motion words (MWs) over relative locations in a lo- cal region of the motion image (MI) around a reference point and thus summarizes the local motion information in a rich 3D MC descriptor. In this way, any human action can be represented as a 3D descriptor by summing up all the MC descriptors of this action. For action recog- nition, we propose 4 different recognition configurations: , (a new direct graphical model by extending pLSA), and MC+SVM. We test our approach on two human action video datasets from KTH and Weizmann Institute of Science (WIS) and our performances are quite promising. For the KTH dataset, the proposed MC representation achieves the highest performance using the proposed w3 -pLSA. For the WIS dataset, the best performance of the proposed MC is comparable to the state of the art.