Abstract
Most human activity analysis works (i.e., recognition or
prediction) only focus on a single granularity, i.e., either
modelling global motion based on the coarse level movement such as human trajectories or forecasting future detailed action based on body parts’ movement such as skeleton motion. In contrast, in this work, we propose a multigranularity interaction prediction network which integrates
both global motion and detailed local action. Built on a bidirectional LSTM network, the proposed method possesses
between granularities links which encourage feature sharing as well as cross-feature consistency between both global
and local granularity (e.g., trajectory or local action), and
in turn predict long-term global location and local dynamics of each individual. We validate our method on several
public datasets with promising performance