KCNN: Kernel-wise Quantization to Remarkably Decrease Multiplications in
Convolutional Neural Network
Abstract
Convolutional neural networks (CNNs) have
demonstrated state-of-the-art performance in computer vision tasks. However, the high computational power demand of running devices of recent CNNs has hampered many of their applications. Recently, many methods have quantized
the floating-point weights and activations to fixedpoints or binary values to convert fractional arithmetic to integer or bit-wise arithmetic. However,
since the distributions of values in CNNs are extremely complex, fixed-points or binary values lead
to numerical information loss and cause performance degradation. On the other hand, convolution
is composed of multiplications and accumulation,
but the implementation of multiplications in hardware is more costly comparing with accumulation.
We can preserve the rich information of floatingpoint values on dedicated low power devices by
considerably decreasing the multiplications. In this
paper, we quantize the floating-point weights in
each kernel separately to multiple bit planes to remarkably decrease multiplications. We obtain a
closed-form solution via an aggressive Lloyd algorithm and the fine-tuning is adopted to optimize
the bit planes. Furthermore, we propose dual normalization to solve the pathological curvature problem during fine-tuning. Our quantized networks
show negligible performance loss compared to their
floating-point counterparts