Abstract
Long Short-Term Memory (LSTM) networks have shown
superior performance in 3D human action recognition due
to their power in modeling the dynamics and dependencies
in sequential data. Since not all joints are informative for
action analysis and the irrelevant joints often bring a lot
of noise, we need to pay more attention to the informative ones. However, original LSTM does not have strong
attention capability. Hence we propose a new class of LSTM network, Global Context-Aware Attention LSTM (GCALSTM), for 3D action recognition, which is able to selectively focus on the informative joints in the action sequence
with the assistance of global contextual information. In order to achieve a reliable attention representation for the
action sequence, we further propose a recurrent attention
mechanism for our GCA-LSTM network, in which the attention performance is improved iteratively. Experiments show
that our end-to-end network can reliably focus on the most
informative joints in each frame of the skeleton sequence.
Moreover, our network yields state-of-the-art performance
on three challenging datasets for 3D action recognition