Learning Low-precision Neural Networks without Straight-Through Estimator (STE)

资源分类

2019-10-08 |

70 |

46 |

Abstract The Straight-Through Estimator (STE) [Hinton, 2012][Bengio et al., 2013] is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology called alpha-blending (AB), which quantizes neural networks to low precision using stochastic gradient descent (SGD). Our (AB) method avoids STE approximation by replacing the quantized weight in the loss function by an affine combination of the quantized weight wq and the corresponding full-precision weight w with non-trainable scalar coefficient ? and (11 ?). During training, ? is gradually increased from 0 to 1; the gradient updates to the weights are through the full precision term, (11 ?)w, of the affine combination; the model is converted from full-precision to low precision progressively. To evaluate the (AB) method, a 1-bit BinaryNet [Hubara et al., 2016a] on CIFAR10 dataset and 8-bits, 4-bits MobileNet v1, ResNet 50 v1/2 on ImageNet are trained using the alpha-blending approach, and the evaluation indicates that AB improves top-1 accuracy by 0.9%, 0.82% and 2.93% respectively compared to the results of STE based quantization [Hubara et al., 2016a] 1 [Krishnamoorthi, 2018]

上一篇：Learning Interpretable Deep State Space Model for Probabilistic Time Series Forecasting

下一篇：Learning Network Embedding with Community Structural Information

用户评价

全部评价

还没有评论，说两句吧！

热门资源

Learning to Predi...

Much of model-based reinforcement learning invo...
Stratified Strate...

In this paper we introduce Stratified Strategy ...
The Variational S...

Unlike traditional images which do not offer in...
Learning to learn...

The move from hand-designed features to learned...
A Mathematical Mo...

Direct democracy, where each voter casts one vo...

智能在线

400-630-6780
聆听.建议反馈

E-mail: support@tusaishared.com