Abstract
Fully convolutional neural network (FCN) has been
dominating the game of face detection task for a few years
with its congenital capability of sliding-window-searching
with shared kernels, which boiled down all the redundant
calculation, and most recent state-of-the-art methods such
as Faster-RCNN, SSD, YOLO and FPN use FCN as their
backbone. So here comes one question: Can we find a universal strategy to further accelerate FCN with higher accuracy, so could accelerate all the recent FCN-based methods? To analyze this, we decompose the face searching space
into two orthogonal directions, ‘scale’ and ‘spatial’. Only a
few coordinates in the space expanded by the two base vectors indicate foreground. So if FCN could ignore most of the
other points, the searching space and false alarm should be
significantly boiled down. Based on this philosophy, a novel
method named scale estimation and spatial attention proposal (S2AP) is proposed to pay attention to some specific
scales in image pyramid and valid locations in each scales
layer. Furthermore, we adopt a masked-convolution operation based on the attention result to accelerate FCN calculation. Experiments show that FCN-based method RPN
can be accelerated by about 4× with the help of S2AP and
masked-FCN and at the same time it can also achieve the
state-of-the-art on FDDB, AFW and MALF face detection
benchmarks as well.