Abstract
Localizing text in the wild is challenging in the situations
of complicated geometric layout of the targets like random
orientation and large aspect ratio. In this paper, we propose
a geometry-aware modeling approach tailored for scene
text representation with an end-to-end learning scheme. In
our approach, a novel Instance Transformation Network
(ITN) is presented to learn the geometry-aware representation encoding the unique geometric configurations of scene
text instances with in-network transformation embedding,
resulting in a robust and elegant framework to detect words
or text lines at one pass. An end-to-end multi-task learning
strategy with transformation regression, text/non-text classification and coordinates regression is adopted in the ITN.
Experiments on the benchmark datasets demonstrate the effectiveness of the proposed approach in detecting scene text
in various geometric configurations.