Abstract
Text detection in natural scene images is a key prerequisite for computer vision tasks such as image search, blind navigation, autopilot, and multi-language translation. Existing text detection methods only detect partial region of large-scale texts and are difficult to detect small-scale texts. Aiming at this problem, an anchor-free multi-orientation text detection method is proposed. Firstly, Feature Pyramid Network (FPN) is used to combine the multiple feature layers of Convolutional Neural Network (CNN) to predict the geometric properties of text, which can be used to expand the receptive field of each pixel and thus help to detect more large-scale texts. Secondly, a new loss function independent of the scale of text is designed, which enables the pixels in the small-scale text to have a larger calculation weight, thereby facilitating the detection of small-scale texts. Finally, the results of pixel-level semantic segmentation are used to filter obviously unreasonable candidate text boxes, and at the same time improve the accuracy and recall rate of text detection. The experimental results on ICDAR 2015 and MSRA-TD500 prove the good performance of our method.
Similar content being viewed by others
References
Bouakkaz M, Ouinten Y, Loudcher S, Fournier-Viger P (2018) Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48(4):1013–1019
Lu L, Yi Y, Huang F, Wang K, et al. (2019) Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images. IEEE ACCESS 7:52669–52679
Ma J, Shao W, Ye H, et al. (2017) Arbitrary-Oriented Scene Text Detection via Rotation Proposals. IEEE Trans Multi 99:1–1
Zhou X, Yao C, Wen H et al (2017) EAST:An Efficient and Accurate Scene Text Detector. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2642–2651
Kong T, Sun F, Liu H et al (2019) FoveaBox:Beyond Anchor-based Object Detector1904.03797
Long J, Shelhamer E, Darrell T (2014) ,Fully Convolutional Networks for Semantic Segmentation. IEEE Trans Pattern Anal Machine Intel 39(4):640–651
Kim KH, Hong S, Roh B, et al. (2016) PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection. 1608.08021
Lin TY, Dollár P, Girshick R et al (2017) Feature Pyramid Networks for Object Detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
He K, Zhang X, Ren S et al (2016) Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) ICDAR 2015 competition on Robust Reading. 13th International Conference on Document Analysis and Recognition (ICDAR) 1156–1160
Yao C, Bai X, Liu W, et al. (2012) Detecting Texts of Arbitrary Orientations in Natural Images. Computer Vision and Pattern Recognition (CVPR) 1083–1090
Liu X, Meng G, Pan C (2019) Scene text detection and recognition with advances in deep learning: a survey. Int J Document Anal Recog (IJDAR) 2(22):143–162
Neumann L, Matas J (2010) A method for text localization and recognition in real-world images. In: Asian Conference on Computer Vision 770–783
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. IEEE Conference on Computer Vision and Pattern Recognition
Yao C, Bai X, Liu W, et al. (2012) Detecting Texts of Arbitrary Orientations in Natural Images. Computer Vision and Pattern Recognition (CVPR) 1083–1090
Yao C, Zhang X, Bai X, et al. (2013) Rotation-Invariant Features for Multi-Oriented Text Detection in Natural Images. PLoS ONE 8(8):e70173
Yao C, Bai X, Liu W (2014) A Unified Framework for Multi-oriented Text Detection and Recognition. IEEE Trans Image Process 23(11):4737–4749
Yin XC, Yin X, Huang K, et al. (2014) Robust Text Detection in Natural Scene Images. IEEE Trans Pattern Anal Machine Intell 36(5):970–983
Yin X, Pei W, Zhang J (2015) Multi-Orientation Scene Text Detection with Adaptive Clustering. IEEE Trans Pattern Anal Machine Intell 37(9):1–1
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405
Ye Q, Doermann DS (2015) Robust scene text detection using integrated feature discrimination. IEEE International Conference on Image Processing (ICIP) 1678–1682
Kang L, Li Y, Doermann D (2014) Orientation Robust Text Line Detection in Natural Images. IEEE Conference on Computer Vision and Pattern Recognition (CVPR
Liao M, Shi B, Bai X et al (2016) TextBoxes: A Fast Text Detector with a Single Deep Neural Network. 31st AAAI Conference on Artificial Intelligence 4161-4167
Liao M, Shi B, Bai X (2018) TextBoxes++: A Single-Shot Oriented Scene Text Detector. IEEE Trans Image Process 27(8):3676–3690
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic Data for Text Localisation in Natural Images. IEEE Conf Comput Vision Pattern Recog (CVPR) 2315–2324
Ma J, Shao W, Ye H, et al. (2017) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed (99) 1–1
Liu Y, Jin L (2017) Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3454–3461
Liu Z, Lin G, Yang S et al (2019) Towards Robust Curve Text Detection with Conditional Spatial Expansion. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Liao M, Zhu Z, Shi B et al (2018) Rotation-Sensitive Regression for Oriented Scene Text Detection. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 5909–5918
Liu Z, Lin G, Yang S et al (2019) Towards Robust Curve Text Detection with Conditional Spatial Expansion. 32th IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Ren S, He K, Girshick R, et al. (2015) Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans Pattern Anal Machine Intell 39(6):1137–1149
Redmon J, Divvala S, Girshick R, et al. (2016) You Only Look Once: Unified, Real-Time Object Detection. 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 779–788
Tian Z, Huang W, He T, et al. (2016) Detecting Text in Natural Image with Connectionist Text Proposal Network. European Conference on Computer Vision (ECCV) 56–72
Shi B, Bai X, Belongie S (2017) Detecting Oriented Text in Natural Images by Linking Segments. 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 3482– 3490
He W, Zhang XY, Yin F et al (2017) Deep Direct Regression for Multi-Oriented Scene Text Detection. 16th IEEE International Conference on Computer Vision (ICCV) 745–753
Yao C, Bai X, Sang N, et al. (2016) Scene Text Detection via Holistic, Multi-Channel Prediction arXiv:1606.09002
Long S, Ruan J, Zhang W, et al. (2018) TextSnake:A Flexible Representation for Detecting Text of Arbitrary Shapes. European Conference on Computer Vision (ECCV
Deng D, Liu H, Li X, et al. (2018) PixelLink: Detecting Scene Text via Instance Segmentation. arXiv:1801.01315
Li X, Wang W, Hou W et al (2019) Shape Robust Text Detection with Progressive Scale Expansion Network. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
He T, Huang W, Qiao Y et al (2016) Accurate Text Localization in Natural Image with Cascaded Convolutional Text Network. https://arxiv.org/pdf/1603.09423.pdf
Qin S, Manduchi R (2017) Cascaded Segmentation-Detection Networks for Word-Level Text Spotting. International Conference on Document Analysis and Recognition(ICDAR) 1275–1282
Lyu P, Yao C, Wu W et al (2018) Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 7553–7563
Liu J, Liu X, Sheng J et al (2019) Pyramid Mask Text Detector. IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
He K, Georgia G, Piotr D, et al. (2018) Mask R-CNN. IEEE Trans Pattern Anal Machine Intell 1–1
Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 3D Vision (3DV). FourthInternational Conference on 3d vision 565–571
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. 18th International Conference on Pattern Recognition (ICPR) 850-855
Yao C, Bai X, Liu W (2014) A unified framework for multi-oriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749
Abadi M, Barham P, Chen J, et al. (2016) Tensorflow: a system for large-scale machine learning. In OSDI 16:265–283
Zhan F, Lu S, Xue C (2018) Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes. European Conference on Computer Vision (ECCV
Zhang Z, Zhang C, Wei S, et al. (2016) Multi-Oriented Text Detection with Fully Convolutional Networks. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 4159–4167
Wang F, Zhao L, Li X et al (2018) Geometry-Aware Scene Text Detection with Instance Transformation Network. 31th IEEE Conference on Computer Vision and Pattern Recognition(CVPR) 1381-1389
Xue C, Lu S, Zhang W (2019) MSR multi-scale shape regression for scene text detection. arXiv:1901.02596
Funding
This work was supported in part by Natural Science Foundation of Lingnan Normal University under Grants QL1307, in party by the key laboratory of Special Child Development and Education of Guangdong province, in part by National Social Science Foundation of China under Grant 61302399, in part by the Natural Science Foundation of China under Grant 61962038 and in part by the Guangxi Bagui Teams for innovation and Research.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Lu, L., Wu, D., Wu, T. et al. Anchor-free multi-orientation text detection in natural scene images. Appl Intell 50, 3623–3637 (2020). https://doi.org/10.1007/s10489-020-01742-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-01742-z