Abstract
This paper presents the first study on forecasting human
dynamics from static images. The problem is to input a single RGB image and generate a sequence of upcoming human body poses in 3D. To address the problem, we propose
the 3D Pose Forecasting Network (3D-PFNet). Our 3DPFNet integrates recent advances on single-image human
pose estimation and sequence prediction, and converts the
2D predictions into 3D space. We train our 3D-PFNet using
a three-step training strategy to leverage a diverse source
of training data, including image and video based human
pose datasets and 3D motion capture (MoCap) data. We
demonstrate competitive performance of our 3D-PFNet on
2D pose forecasting and 3D pose recovery through quantitative and qualitative results.