Abstract
Various visual tasks such as the recognition of human actions, ges- tures, facial expressions, and classi fication of dynamic textures require modeling and the representation of spatio-temporal information. In this paper, we propose representing space-time patterns using directional spatio-temporal oriented gra- dients. In the proposed approach, a 3D video patch is represented by a histogram of oriented gradients over nine symmetric spatio-temporal planes. Video com- parison is achieved through a positive definite similarity kernel that is learnt by multiple kernel learning. A rich spatio-temporal descriptor with a simple trade-off between discriminatory power and invariance properties is thereby obtained. To evaluate the proposed approach, we consider three challenging visual recognition tasks, namely the classi fication of dynamic textures, human gestures and human actions. Our evaluations indicate that the proposed approach attains signi ficant classification improvements in recognition accuracy in comparison to state-of- the-art methods such as LBP-TOP, 3D-SIFT, HOG3D, tensor canonical correla- tion analysis, and dynamical fractal analysis.