Learning and Refining of Privileged Information-based RNNs for Action
Recognition from Depth Sequences
Abstract
Existing RNN-based approaches for action recognition
from depth sequences require either skeleton joints or handcrafted depth features as inputs. An end-to-end manner,
mapping from raw depth maps to action classes, is nontrivial to design due to the fact that: 1) single channel map
lacks texture thus weakens the discriminative power; 2) relatively small set of depth training data. To address these
challenges, we propose to learn an RNN driven by privileged information (PI) in three-steps: An encoder is pretrained to learn a joint embedding of depth appearance and
PI (i.e. skeleton joints). The learned embedding layers are
then tuned in the learning step, aiming to optimize the network by exploiting PI in a form of multi-task loss. However, exploiting PI as a secondary task provides little help
to improve the performance of a primary task (i.e. classi-
fication) due to the gap between them. Finally, a bridging
matrix is defined to connect two tasks by discovering latent PI in the refining step. Our PI-based classification loss
maintains a consistency between latent PI and predicted
distribution. The latent PI and network are iteratively estimated and updated in an expectation-maximization procedure. The proposed learning process provides greater discriminative power to model subtle depth difference, while
helping avoid overfitting the scarcer training data. Our experiments show significant performance gains over stateof-the-art methods on three public benchmark datasets and
our newly collected Blanket dataset