Abstract We describe novel but simple motion features for the problem of detecting objects in video sequences. Previous approaches either compute optical flflow or temporal differences on video frame pairs with various assumptions about stabilization. We describe a combined approach that uses coarse-scale flflow and fifine-scale temporal difference features. Our approach performs weak motion stabilization by factoring out camera motion and coarse object motion while preserving nonrigid motions that serve as useful cues for recognition. We show results for pedestrian detection and human pose estimation in video sequences, achieving state-of-the-art results in both. In particular, given a fifixed detection rate our method achieves a fifive-fold reduction in false positives over prior art on the Caltech Pedestrian benchmark. Finally, we perform extensive diagnostic experiments to reveal what aspects of our system are crucial for good performance. Proper stabilization, long time-scale features, and proper normalization are all critical.