Abstract. Recently developed object detectors employ a convolutional
neural network (CNN) by gradually increasing the number of feature
layers with a pyramidal shape instead of using a featurized image pyramid. However, the different abstraction levels of CNN feature layers often
limit the detection performance, especially on small objects. To overcome
this limitation, we propose a CNN-based object detection architecture,
referred to as a parallel feature pyramid (FP) network (PFPNet), where
the FP is constructed by widening the network width instead of increasing the network depth. First, we adopt spatial pyramid pooling and some
additional feature transformations to generate a pool of feature maps
with different sizes. In PFPNet, the additional feature transformation is
performed in parallel, which yields the feature maps with similar levels
of semantic abstraction across the scales. We then resize the elements of
the feature pool to a uniform size and aggregate their contextual information to generate each level of the final FP. The experimental results
confirmed that PFPNet increases the performance of the latest version
of the single-shot multi-box detector (SSD) by mAP of 6.4% AP and
especially, 7.8% APsmall on the MS-COCO dataset.