Abstract. This work proposes an algorithm, called NetAdapt, that automatically adapts a pre-trained deep neural network to a mobile platform given a resource budget. While many existing algorithms simplify
networks based on the number of MACs or weights, optimizing those
indirect metrics may not necessarily reduce the direct metrics, such as
latency and energy consumption. To solve this problem, NetAdapt incorporates direct metrics into its adaptation algorithm. These direct metrics
are evaluated using empirical measurements, so that detailed knowledge
of the platform and toolchain is not required. NetAdapt automatically
and progressively simplifies a pre-trained network until the resource budget is met while maximizing the accuracy. Experiment results show that
NetAdapt achieves better accuracy versus latency trade-offs on both mobile CPU and mobile GPU, compared with the state-of-the-art automated network simplification algorithms. For image classification on the
ImageNet dataset, NetAdapt achieves up to a 1.7× speedup in measured
inference latency with equal or higher accuracy on MobileNets (V1&V2).