Abstract
Efficient deep neural network (DNN) inference on mobile or embedded devices typically involvesquantization of the network parameters and activations. In particular, mixed precision networksachieve better performance than networks with homogeneous bitwidth for the same size constraint.Since choosing the optimal bitwidths is not straight forward, training methods, which can learnthem, are desirable. Differentiable quantization with straight-through gradients allows to learn thequantizer’s parameters using gradient methods. We show that a suited parametrization of the quantizeris the key to achieve a stable training and a good final performance. Specifically, we propose toparametrize the quantizer with the step size and dynamic range. The bitwidth can then be inferredfrom them. Other parametrizations, which explicitly use the bitwidth, consistently perform worse. Weconfirm our findings with experiments on CIFAR-10 and ImageNet and we obtain mixed precisionDNNs with learned quantization parameters, achieving state-of-the-art performance.