PoseFlow: A Deep Motion Representation
for Understanding Human Behaviors in Videos
Abstract
Motion of the human body is the critical cue for understanding and characterizing human behavior in videos.
Most existing approaches explore the motion cue using optical flows. However, optical flow usually contains motion on
both the interested human bodies and the undesired background. This “noisy” motion representation makes it very
challenging for pose estimation and action recognition in
real scenarios. To address this issue, this paper presents a
novel deep motion representation, called PoseFlow, which
reveals human motion in videos while suppressing background and motion blur, and being robust to occlusion. For
learning PoseFlow with mild computational cost, we propose a functionally structured spatial-temporal deep network, PoseFlow Net (PFN), to jointly solve the skeleton localization and matching problems of PoseFlow. Comprehensive experiments show that PFN outperforms the stateof-the-art deep flow estimation models in generating PoseFlow. Moreover, PoseFlow demonstrates its potential on
improving two challenging tasks in human video analysis:
pose estimation and action recognition.