Abstract
Human motion modeling is a classic problem in computer vision and graphics. Challenges in modeling human
motion include high dimensional prediction as well as extremely complicated dynamics.We present a novel approach
to human motion modeling based on convolutional neural
networks (CNN). The hierarchical structure of CNN makes
it capable of capturing both spatial and temporal correlations effectively. In our proposed approach, a convolutional
long-term encoder is used to encode the whole given motion
sequence into a long-term hidden variable, which is used
with a decoder to predict the remainder of the sequence.
The decoder itself also has an encoder-decoder structure, in
which the short-term encoder encodes a shorter sequence to
a short-term hidden variable, and the spatial decoder maps
the long and short-term hidden variable to motion predictions. By using such a model, we are able to capture both
invariant and dynamic information of human motion, which
results in more accurate predictions. Experiments show that
our algorithm outperforms the state-of-the-art methods on
the Human3.6M and CMU Motion Capture datasets. Our
code is available at the project website