Transition Forests: Learning Discriminative Temporal Transitions for Action
Recognition and Detection
Abstract
A human action can be seen as transitions between one’s
body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus
involves learning a classifier sensitive to these pose transitions as well as to static poses. In this paper, we introduce a
novel method called transitions forests, an ensemble of decision trees that both learn to discriminate static poses and
transitions between pairs of two independent frames. During training, node splitting is driven by alternating two criteria: the standard classification objective that maximizes
the discrimination power in individual frames, and the proposed one in pairwise frame transitions. Growing the trees
tends to group frames that have similar associated transitions and share same action label incorporating temporal
information that was not available otherwise. Unlike conventional decision trees where the best split in a node is determined independently of other nodes, the transition forests
try to find the best split of nodes jointly (within a layer) for
incorporating distant node transitions. When inferring the
class label of a new frame, it is passed down the trees and
the prediction is made based on previous frame predictions
and the current one in an efficient and online manner. We
apply our method on varied skeleton action recognition and
online detection datasets showing its suitability over several
baselines and state-of-the-art approaches.