Abstract
Conventional deep convolutional neural networks
(CNNs) apply convolution operators uniformly in space
across all feature maps for hundreds of layers - this incurs
a high computational cost for real-time applications. For
many problems such as object detection and semantic
segmentation, we are able to obtain a low-cost computation
mask, either from a priori problem knowledge, or from a
low-resolution segmentation network. We show that such
computation masks can be used to reduce computation
in the high-resolution main network. Variants of sparse
activation CNNs have previously been explored on smallscale tasks and showed no degradation in terms of object
classification accuracy, but often measured gains in terms
of theoretical FLOPs without realizing a practical speedup when compared to highly optimized dense convolution
implementations. In this work, we leverage the sparsity
structure of computation masks and propose a novel
tiling-based sparse convolution algorithm. We verified the
effectiveness of our sparse CNN on LiDAR-based 3D object
detection, and we report significant wall-clock speed-ups
compared to dense convolution without noticeable loss of
accuracy