Abstract
In this paper, a multi-feature max-margin hierarchical Bayesian model (M3HBM) is proposed for action recognition. Different from existing methods which separate representation and classifification into two steps, M3HBM jointly learns a high-level representation by combining a hierarchical generative model (HGM) and discriminative maxmargin classififiers in a unifified Bayesian framework. Specifically, HGM is proposed to represent actions by distributions over latent spatial temporal patterns (STPs) which are learned from multiple feature modalities and shared among different classes. For recognition, we employ Gibbs classififiers to minimize the expected loss function based on the max-margin principle and use the classififiers as regularization terms of M3HBM to perform Bayeisan estimation for classififier parameters together with the learning of STPs. In addition, multi-task learning is applied to learn the model from multiple feature modalities for different classes. For test videos, we obtain the representations by the inference process and perform action recognition by the learned Gibbs classififiers. For the learning and inference process, we derive an effificient Gibbs sampling algorithm to solve the proposed M3HBM. Extensive experiments on several datasets demonstrate both the representation power and the classifification capability of our approach for action recognition.