LQ-Nets: Learned Quantization for HighlyAccurate and Compact Deep Neural Networks
Abstract. Although weight and activation quantization is an effective
approach for Deep Neural Network (DNN) compression and has a lot of
potentials to increase inference speed leveraging bit-operations, there is
still a noticeable gap in terms of prediction accuracy between the quantized model and the full-precision model. To address this gap, we propose
to jointly train a quantized, bit-operation-compatible DNN and its associated quantizers, as opposed to using fixed, handcrafted quantization
schemes such as uniform or logarithmic quantization. Our method for
learning the quantizers applies to both network weights and activations
with arbitrary-bit precision, and our quantizers are easy to train. The
comprehensive experiments on CIFAR-10 and ImageNet datasets show
that our method works consistently well for various network structures
such as AlexNet, VGG-Net, GoogLeNet, ResNet, and DenseNet, surpassing previous quantization methods in terms of accuracy by an appreciable
margin. Code available at https://github.com/Microsoft/LQ-Nets