Abstract
We introduce a novel approach to automatically learn intu- itive and compact descriptors of human body motions for activity recog- nition. Each action descriptor is produced, first, by applying Temporal Laplacian Eigenmaps to view-dependent videos in order to produce a stylistic invariant embedded manifold for each view separately. Then, all view-dependent manifolds are automatically combined to discover a unified representation which model in a single three dimensional space an action independently from style and viewpoint. In addition, a bidi- rectional nonlinear mapping function is incorporated to allow pro jecting actions between original and embedded spaces. The proposed framework is evaluated on a real and challenging dataset (IXMAS), which is com- posed of a variety of actions seen from arbitrary viewpoints. Experimen- tal results demonstrate robustness against style and view variation and match the most accurate action recognition method.