A tiny improvement on SSD (Single Shot Multibox Detector). Using the
feature map concatenation module and FPN module on the head of SSD. This
project is based on the original project which is implemented in caffe.
In this repository, we proposal to detect the small object effectively by using the concatenation module and FPN module.
Feature
I add three extra layers on the head of SSD which is generated by the
concatenation module. The concatenation moudle is showd as followed.
The feature-fused layer consists of 512 2H×2W feature maps. The first
128 feature maps are generated by subsampling the 4H×4W feature layer
with a 3×3 convolution kernel and activate by the function ReLU. A batch
normalization layer is used after subsampling. The main reason is that
the features learned by the shallow feature layer and the feature
learned by the higher layer have different distributions and gaps. It is
difficult to learn and predict. The middle 256 feature maps are
generated by dimension reduction and feature combination of the 2H×2W
prediction layer through a 3×3 convolution kernel and activate by ReLU.
The last 128 feature maps are upsampled by the high-level H×W feature
layer through a 2×2 convolution kernel and activate by ReLU activation
function.
After concatenating the feature maps from three different layers of the
feature pyramid, a 3×3 convolution kernel is used to learn the
feature-fused maps, in order to eliminate the differences of
distribution and gaps.
The overall network is showed as followed.
Result
Reference
[1] Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B.,
& Belongie, S.: LIN, Tsung-Yi, et al. Feature pyramid networks for
object detection. In: CVPR. ( 2017). p. 4.