Abstract
We take a new approach to computing dense scene flflow between a pair of consecutive RGB-D frames. We exploit the availability of depth data by seeking correspondences with respect to patches specifified not as the pixels inside square windows, but as the 3D points that are the inliers of spheres in world space. Our primary contribution is to show that by reasoning in terms of such patches under 6 DoF rigid body motions in 3D, we succeed in obtaining compelling results at displacements large and small without relying on either of two simplifying assumptions that pervade much of the earlier literature: brightness constancy or local surface planarity. As a consequence of our approach, our output is a dense fifield of 3D rigid body motions, in contrast to the 3D translations that are the norm in scene flflow. Reasoning in our manner additionally allows us to carry out occlusion handling using a 6 DoF consistency check for the flflow computed in both directions and a patchwise silhouette check to help reason about alignments in occlusion areas, and to promote smoothness of the flflow fifields using an intuitive local rigidity prior. We carry out our optimization in two steps, obtaining a fifirst correspondence fifield using an adaptation of PatchMatch, and subsequently using α-expansion to jointly handle occlusions and perform regularization. We show attractive flflow results on challenging synthetic and real-world scenes that push the practical limits of the aforementioned assumptions.