PointFlowNet: Learning Representations for
Rigid Motion Estimation from Point Clouds
Abstract
Despite significant progress in image-based 3D scene
flow estimation, the performance of such approaches has
not yet reached the fidelity required by many applications.
Simultaneously, these applications are often not restricted to
image-based estimation: laser scanners provide a popular
alternative to traditional cameras, for example in the context
of self-driving cars, as they directly yield a 3D point cloud.
In this paper, we propose to estimate 3D motion from such
unstructured point clouds using a deep neural network. In
a single forward pass, our model jointly predicts 3D scene
flow as well as the 3D bounding box and rigid body motion
of objects in the scene. While the prospect of estimating 3D
scene flow from unstructured point clouds is promising, it is
also a challenging task. We show that the traditional global
representation of rigid body motion prohibits inference by
CNNs, and propose a translation equivariant representation
to circumvent this problem. For training our deep network,
a large dataset is required. Because of this, we augment real
scans from KITTI with virtual objects, realistically modeling
occlusions and simulating sensor noise. A thorough comparison with classic and learning-based techniques highlights
the robustness of the proposed approach.