Abstract
In this paper, we address the problem of 3D articulated multi-person tracking in busy street scenes from a moving, human-level observer. In order to handle the complexity of multi-person interactions, we propose to pursue a two- stage strategy. A multi-body detection-based tracker first analyzes the scene and recovers individual pedestrian trajectories, bridging sensor gaps and resolving tem- porary occlusions. A specialized articulated tracker is then applied to each re- covered pedestrian trajectory in parallel to estimate the tracked person’s precise body pose over time. This articulated tracker is implemented in a Gaussian Process framework and operates on global pedestrian silhouettes using a learned statisti- cal representation of human body dynamics. We interface the two tracking levels through a guided segmentation stage, which combines traditional bottom-up cues with top-down information from a human detector and the articulated tracker’s shape prediction. We show the proposed approach’s viability and demonstrate its performance for articulated multi-person tracking on several challenging video se- quences of a busy inner-city scenario.