Abstract
In this paper we introduce a new method for text detec-tion in natural images. The method comprises two contribu-tions: First, a fast and scalable engine to generate syntheticimages of text in clutter. This engine overlays synthetic textto existing background images in a natural way, account-ing for the local 3D scene geometry. Second, we use thesynthetic images to train a Fully-Convolutional Regression Network (FCRN) which efficiently performs text detectionand bounding-box regression at all locations and multiple scales in an image. We discuss the relation of FCRN to therecently-introduced YOLO detector, as well as other end-to-end object detection systems based on deep learning. Theresulting detection network significantly out performs current methods for text detection in natural images, achieving an F-measure of 84.2% on the standard ICDAR 2013 benchmark. Furthermore, it can process 15 images per second on a GPU.