Abstract
Convolutional networks are the de-facto standard for analyzing spatio-temporal data such as images, videos, and
3D shapes. Whilst some of this data is naturally dense (e.g.,
photos), many other data sources are inherently sparse. Examples include 3D point clouds that were obtained using
a LiDAR scanner or RGB-D camera. Standard “dense”
implementations of convolutional networks are very ineffi-
cient when applied on such sparse data. We introduce new
sparse convolutional operations that are designed to process spatially-sparse data more efficiently, and use them
to develop spatially-sparse convolutional networks. We
demonstrate the strong performance of the resulting models, called submanifold sparse convolutional networks (SSCNs), on two tasks involving semantic segmentation of 3D
point clouds. In particular, our models outperform all prior
state-of-the-art on the test set of a recent semantic segmentation competition