Learning Rigidity in Dynamic Scenes with a
Moving Camera for 3D Motion Field Estimation
Abstract. Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems.
In real-world applications, a dynamic scene is commonly captured by a
moving camera (i.e., panning, tilting or hand-held), increasing the task
complexity because the scene is observed from different viewpoints. The
primary challenge is the disambiguation of the camera motion from scene
motion, which becomes more difficult as the amount of rigidity observed
decreases, even with successful estimation of 2D image correspondences.
Compared to other state-of-the-art 3D scene flow estimation methods,
in this paper, we propose to learn the rigidity of a scene in a supervised
manner from an extensive collection of dynamic scene data, and directly
infer a rigidity mask from two sequential images with depths. With the
learned network, we show how we can effectively estimate camera motion
and projected scene flow using computed 2D optical flow and the inferred
rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground
objects with a real background) and an evaluation split that accounts for
the percentage of observed non-rigid pixels. Through our evaluation, we
show the proposed framework outperforms current state-of-the-art scene
flow estimation methods in challenging dynamic scenes