Abstract
Object detectors have hugely profited from moving towards an end-to-end learning paradigm: proposals, features, and the classifier becoming one neural network
improved results two-fold on general object detection.
One indispensable component is non-maximum suppression
(NMS), a post-processing algorithm responsible for merging all detections that belong to the same object. The de
facto standard NMS algorithm is still fully hand-crafted,
suspiciously simple, and — being based on greedy clustering with a fixed distance threshold — forces a trade-off
between recall and precision. We propose a new network
architecture designed to perform NMS, using only boxes
and their score. We report experiments for person detection
on PETS and for general object categories on the COCO
dataset. Our approach shows promise providing improved
localization and occlusion handling