ESPNetv2: A Light-weight, Power Efficient, and General PurposeConvolutional Neural Network
Abstract We introduce a light-weight, power efficient, and general purpose convolutional neural network, ESPNetv2,
for modeling visual and sequential data. Our network uses
group point-wise and depth-wise dilated separable convolutions to learn representations from a large effective receptive field with fewer FLOPs and parameters. The performance of our network is evaluated on four different tasks:
(1) object classification, (2) semantic segmentation, (3) object detection, and (4) language modeling. Experiments
on these tasks, including image classification on the ImageNet and language modeling on the PenTree bank dataset,
demonstrate the superior performance of our method over
the state-of-the-art methods. Our network outperforms ESPNet by 4-5% and has 2 4× fewer FLOPs on the PASCAL
VOC and the Cityscapes dataset. Compared to YOLOv2
on the MS-COCO object detection, ESPNetv2 delivers
4.4% higher accuracy with 6× fewer FLOPs. Our experiments show that ESPNetv2 is much more power effi-
cient than existing state-of-the-art efficient methods including ShuffleNets and MobileNets. Our code is open-source
and available at https://github.com/sacmehta/
ESPNetv2.