Abstract
Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key
tasks: 1) text presence detection, which is a classification
problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation.
Previous methods rely on shared features for both tasks,
resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose
to perform classification and regression on features of different characteristics, extracted by two network branches
of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the
convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotationsensitive features. The proposed method named Rotationsensitive Regression Detector (RRD) achieves state-of-theart performance on several oriented scene text benchmark
datasets, including ICDAR 2015, MSRA-TD500, RCTW-17,
and COCO-Text. Furthermore, RRD achieves a significant
improvement on a ship collection dataset, demonstrating its
generality on oriented object detection