Abstract. Deep neural network compression has the potential to bring
modern resource-hungry deep networks to resource-limited devices. However, in many of the most compelling deployment scenarios of compressed
deep networks, the operational constraints matter: for example, a pedestrian detection network on a self-driving car may have to satisfy a latency
constraint for safe operation. We propose the first principled treatment of
deep network compression under operational constraints. We formulate
the compression learning problem from the perspective of constrained
Bayesian optimization, and introduce a cooling (annealing) strategy to
guide the network compression towards the target constraints. Experiments on ImageNet demonstrate the value of modelling constraints directly in network compression