Abstract
The Straight-Through Estimator (STE) [Hinton,
2012][Bengio et al., 2013] is widely used for
back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB),
which quantizes neural networks to low precision
using stochastic gradient descent (SGD). Our
(AB) method avoids STE approximation by replacing the quantized weight in the loss function by
an affine combination of the quantized weight wq
and the corresponding full-precision weight w with
non-trainable scalar coefficient ? and (11 ?). During training, ? is gradually increased from 0 to 1;
the gradient updates to the weights are through the
full precision term, (11 ?)w, of the affine combination; the model is converted from full-precision to
low precision progressively. To evaluate the (AB)
method, a 1-bit BinaryNet [Hubara et al., 2016a]
on CIFAR10 dataset and 8-bits, 4-bits MobileNet
v1, ResNet 50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by
0.9%, 0.82% and 2.93% respectively compared to
the results of STE based quantization [Hubara et
al., 2016a] 1 [Krishnamoorthi, 2018]