Abstract
Pruning a neural net consists of removing weights without degrading its performance. This is an old problem
of renewed interest because of the need to compress ever
larger nets so they can run in mobile devices. Pruning has
been traditionally done by ranking or penalizing weights
according to some criterion (such as magnitude), removing low-ranked weights and retraining the remaining ones.
We formulate pruning as an optimization problem of finding the weights that minimize the loss while satisfying a
pruning cost condition. We give a generic algorithm to
solve this which alternates “learning” steps that optimize a
regularized, data-dependent loss and “compression” steps
that mark weights for pruning in a data-independent way.
Magnitude thresholding arises naturally in the compression
step, but unlike existing magnitude pruning approaches, our
algorithm explores subsets of weights rather than committing irrevocably to a specific subset from the beginning. It is
also able to learn automatically the best number of weights
to prune in each layer of the net without incurring an exponentially costly model selection. Using a single pruninglevel user parameter, we achieve state-of-the-art pruning in
LeNet and ResNets of various sizes