Skeleton-Based Action Recognition with SpatialReasoning and Temporal Stack Learning
Abstract. Skeleton-based action recognition has made great progress
recently, but many problems still remain unsolved. For example, the
representations of skeleton sequences captured by most of the previous
methods lack spatial structure information and detailed temporal dynamics features. In this paper, we propose a novel model with spatial
reasoning and temporal stack learning (SR-TSL) for skeleton-based action recognition, which consists of a spatial reasoning network (SRN) and
a temporal stack learning network (TSLN). The SRN can capture the
high-level spatial structural information within each frame by a residual
graph neural network, while the TSLN can model the detailed temporal
dynamics of skeleton sequences by a composition of multiple skip-clip
LSTMs. During training, we propose a clip-based incremental loss to
optimize the model. We perform extensive experiments on the SYSU
3D Human-Object Interaction dataset and NTU RGB+D dataset and
verify the effectiveness of each network of our model. The comparison
results illustrate that our approach achieves much better results than
the state-of-the-art methods