Abstract
To predict a set of diverse and informative proposals with
enriched representations, this paper introduces a differentiable Determinantal Point Process (DPP) layer that is able
to augment the object detection architectures. Most modern
object detection architectures, such as Faster R-CNN, learn
to localize objects by minimizing deviations from the ground
truth, but ignore correlation “between” multiple proposals
and object categories. Non-Maximum Suppression (NMS) as
a widely used proposal pruning scheme ignores label- and
instance-level relations between object candidates resulting
in multi-labeled detections. In the multi-class case, NMS
selects boxes with the largest prediction scores ignoring the
semantic relation between categories of potential election.
In contrast, our trainable DPP layer, allowing for Learning
Detection with Diverse Proposals (LDDP), considers both
label-level contextual information and spatial layout relationships between proposals without increasing the number
of parameters of the network, and thus improves location
and category specifications of final detected bounding boxes
substantially during both training and inference schemes.
Furthermore, we show that LDDP keeps it superiority over
Faster R-CNN even if the number of proposals generated by
LDPP is only ?30% as many as those for Faster R-CNN