Abstract
The ob jective of this paper is to recognize gestures in videos – both localizing the gesture and classifying it into one of multiple classes. We show that the performance of a gesture classifier learnt from a sin- gle (strongly supervised) training example can be boosted significantly using a ‘reservoir’ of weakly supervised gesture examples (and that the performance exceeds learning from the one-shot example or reservoir alone). The one-shot example and weakly supervised reservoir are from different ‘domains’ (different people, different videos, continuous or non- continuous gesturing, etc .), and we propose a domain adaptation method for human pose and hand shape that enables gesture learning methods to generalise between them. We also show the benefits of using the re- cently introduced Global Alignment Kernel [12], instead of the standard Dynamic Time Warping that is generally used for time alignment. The domain adaptation and learning methods are evaluated on two large scale challenging gesture datasets: one for sign language, and the other for Italian hand gestures. In both cases performance exceeds the previous published results, including the best skeleton-classification-only entry in the 2013 ChaLearn challenge.