Abstract. We explore an approach to forecasting human motion in a few milliseconds given an input 3D skeleton sequence based on a recurrent encoderdecoder framework. Current approaches suffer from the problem of prediction
discontinuities and may fail to predict human-like motion in longer time horizons due to error accumulation. We address these critical issues by incorporating local geometric structure constraints and regularizing predictions with plausible temporal smoothness and continuity from a global perspective. Specifically,
rather than using the conventional Euclidean loss, we propose a novel framewise geodesic loss as a geometrically meaningful, more precise distance measurement. Moreover, inspired by the adversarial training mechanism, we present
a new learning procedure to simultaneously validate the sequence-level plausibility of the prediction and its coherence with the input sequence by introducing
two global recurrent discriminators. An unconditional, fidelity discriminator and
a conditional, continuity discriminator are jointly trained along with the predictor in an adversarial manner. Our resulting adversarial geometry-aware encoderdecoder (AGED) model significantly outperforms state-of-the-art deep learning
based approaches on the heavily benchmarked H3.6M dataset in both short-term
and long-term predictions