Abstract
Benefiting from tens of millions of hierarchically stacked
learnable parameters,Deep Neural Networks(DNNs)have
demonstrated overwhelming accuracy on a variety of artifi-
cial intelligence tasks.However reversely,the large size of
DNN models lays a heavy burden on storage,computation
and power consumption,which prohibits their deployments
on the embedded and mobile systems.In this paper, we pro-
pose Explicit Loss-error-aware Quantization(ELQ,,a new
method that can train DNN models with very low-bit pa-
rameter values such as ternary and binary ones to approx-
imate 32-bit floating-point counterparts without noticeable
loss of predication accuracy.Unlike existing methods that
usually pose the problem as a straighrforwvard approxima-
tion of the layer-wise weights or outputs of the original full-
precision model(specifically,minimizing the error of the
layer-wise weights or inner products of the weights and the
inputs between the original and respective quantized model-
s,our ELQ elaborately bridges the loss perturbation from
the weight quantization and an incremental quantization s-
trategy to address DNN quantization.Through explicitly
regularizing the loss perturbation and the weight approx-
imation error in an incremental way,we show that such
a new optimization merhod is theoretically reasonable and
practically effective.As validated with two mainstream con-
volutional neural network families(i.e.,fully conolutional
and non-fiully convolutional),our ELQ shows better resulis
than state-of -the-art quantization methods on the large s-
cale ImageNet classification dataset.Code will be made
publicly available