Abstract. In this paper, we propose quantized densely connected UNets for efficient visual landmark localization. The idea is that features
of the same semantic meanings are globally reused across the stacked
U-Nets. This dense connectivity largely improves the information flow,
yielding improved localization accuracy. However, a vanilla dense design
would suffer from critical efficiency issue in both training and testing. To
solve this problem, we first propose order-K dense connectivity to trim off
long-distance shortcuts; then, we use a memory-efficient implementation
to significantly boost the training efficiency and investigate an iterative
refinement that may slice the model size in half. Finally, to reduce the
memory consumption and high precision operations both in training
and testing, we further quantize weights, inputs, and gradients of our
localization network to low bit-width numbers. We validate our approach
in two tasks: human pose estimation and face alignment. The results show
that our approach achieves state-of-the-art localization accuracy, but
using ?70% fewer parameters, ?98% less model size and saving ?32×
training memory compared with other benchmark localizers