Abstract
Inference for state-of-the-art deep neural networks is
computationally expensive, making them difficult to deploy
on constrained hardware environments. An efficient way
to reduce this complexity is to quantize the weight parameters and/or activations during training by approximating their distributions with a limited entry codebook. For
very low-precisions, such as binary or ternary networks
with 1-8-bit activations, the information loss from quantization leads to significant accuracy degradation due to
large gradient mismatches between the forward and backward functions. In this paper, we introduce a quantization
method to reduce this loss by learning a symmetric codebook for particular weight subgroups. These subgroups
are determined based on their locality in the weight matrix, such that the hardware simplicity of the low-precision
representations is preserved. Empirically, we show that
symmetric quantization can substantially improve accuracy for networks with extremely low-precision weights
and activations. We also demonstrate that this representation imposes minimal or no hardware implications to more
coarse-grained approaches. Source code is available at
https://www.github.com/julianfaraone/SYQ