Abstract
In this paper, we propose a novel approach for text detec-tion in natural images. Both local and global cues are taken into account for localizing text lines in a coarse-to-fine pro-cedure. First, a Fully Convolutional Network (FCN) modelis trained to predict the salient map of text regions in aholistic manner. Then, text line hypotheses are estimated by combining the salient map and character components. Finally, another FCN classifier is used to predict the centroid of each character, in order to remove the false hypotheses. The framework is general for handling text in multiple orientations, languages and fonts. The proposed method consistently achieves the state-of-the-art performance on three text detection benchmarks: MSRA-TD500, ICDAR2015 andICDAR2013.