Scene text detection with fully convolutional neural networks

Liu, Zhandong; Zhou, Wengang; Li, Houqiang

doi:10.1007/s11042-019-7177-4

Scene text detection with fully convolutional neural networks

Published: 21 January 2019

Volume 78, pages 18205–18227, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhandong Liu¹,
Wengang Zhou¹ &
Houqiang Li¹

598 Accesses
17 Citations
Explore all metrics

Abstract

Text detection in scene image has become a hot topic in computer vision and artificial intelligence research, due to its wide range of applications and challenges. Most state-of-the-art methods for text detection based on deep learning rely on text bounding box regression. These methods can not well handle the case that if the scene text is curved. In this paper, we propose a new framework for arbitrarily oriented text detection in natural images based on fully convolutional neural networks. The main idea is to represent a text instance by two forms: text center block and word stroke region. These two elements are detected by two fully convolutional networks, respectively. Final detections are produced by the word region surrounding box algorithm. The proposed method does not need to regress the extant bounding box of the text instance, mainly because the predicted text block region itself implicitly contains position and orientation information. Besides, our method can well handle text in different languages, arbitrary orientations, curved shape and various fonts. To validate the effectiveness of the proposed method, we perform experiments on three public datasets: MSRA-TD500, USTB-SV1K and ICDAR2013, and compare it with other state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results. Based on VGG-16, our method achieves an F-measure of 78.84% on MSRA-TD500, 59.34% on USTB-SV1K, and 88.21% on ICDAR2013.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Unified Deep Neural Network for Scene Text Detection

Accurate Detection for Scene Texts with a Cascaded CNN Networks

References

Bai X, Yang M, Lyu P, Xu Y (2017) Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks. arXiv:1704.04613
Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: IEEE international conference on computer vision, pp 1206–1214
Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: IEEE conference on computer vision and pattern recognition, pp 3566–3573
Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. arXiv:1801.01315
Eom S, Huh JH (2018) Group signature with restrictive linkability: minimizing privacy exposure in ubiquitous environment. J Ambient Intell Humaniz Comput: 1–11
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition, pp 2963–2970
He D, Yang X, Liang C, Zhou Z, Ororbia AG, Kifer D, Giles CL (2017) Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519–3528
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047–3055
He T, Huang W, Qiao Y, Yao J (2016) Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423
Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv:1509.04874
Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241–1248
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497–511
Huh JH, Otgonchimeg S, Seo K (2016) Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the plc network base technology for smart grid system. J Supercomput 72(5):1862–1877
Article Google Scholar
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678
Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579
Kang L, Li Y, Doermann D (2014) Orientation robust text line detection in natural images. In: IEEE conference on computer vision and pattern recognition, pp 4034–4041
Karaoglu S, Tao R, Gevers T, Smeulders AW (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimedia 19(5):1063–1076
Article Google Scholar
Khare V, Shivakumara P, Paramesran R, Blumenstein M (2017) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76 (15):16625–16655
Article Google Scholar
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI conference on artificial intelligence, pp 4161–4167
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2980–2988
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. arXiv:1801.01671
Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440
Loukhaoukha K, Chouinard JY, Berdai A (2012) A secure image encryption algorithm based on rubik’s cube principle. J Electr Comput Eng 2012:7
MathSciNet MATH Google Scholar
Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767
Article Google Scholar
Nikoloudakis Y, Panagiotakis S, Markakis E, Pallis E, Mastorakis G, Mavromoustakis CX, Dobre C (2016) A fog-based emergency system for smart enhanced living environments. IEEE Cloud Computing 3(6): 54–62
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE conference on computer vision and pattern recognition, pp 2550–2558
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304
Article Google Scholar
Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: IEEE international conference on computer vision, pp 4651–4659
Tian S, Pei WY, Zuo ZY, Yin X (2016) Scene text detection in video by learning locally and globally. In: International joint conference on artificial intelligence, pp 2647–2653
Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296
Article Google Scholar
Xie S, Tu Z (2015) Holistically-nested edge detection. In: IEEE international conference on computer vision, pp 1395–1403
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083–1090
Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002
Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition, pp 15–28
Yin X, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937
Article Google Scholar
Yin X, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Article Google Scholar
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: IEEE conference on computer vision and pattern recognition, pp 2558–2567
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4159–4167
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE conference on computer vision and pattern recognition, pp 2642–2651

Download references

Acknowledgements

This work is supported in part to Zhandong Liu by Natural Science Foundation of China under contract No. 61662082 and No. U1703261, and in part to Dr. Wengang Zhou by Natural Science Foundation of China under contract No. 61632019, the Fundamental Research Funds for the Central Universities, and Young Elite Scientists Sponsorship Program By CAST (No. 2016QNRC001), and in part to Dr. Houqiang Li by Natural Science Foundation of China under contract No. 61836011 and No. 61390514.

Author information

Authors and Affiliations

CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System, Department of Electronic Engineering and Information Science, University of Science and Technology of China, Hefei, 230027, China
Zhandong Liu, Wengang Zhou & Houqiang Li

Authors

Zhandong Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wengang Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Houqiang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Houqiang Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Zhou, W. & Li, H. Scene text detection with fully convolutional neural networks. Multimed Tools Appl 78, 18205–18227 (2019). https://doi.org/10.1007/s11042-019-7177-4

Download citation

Received: 19 May 2018
Revised: 04 November 2018
Accepted: 06 January 2019
Published: 21 January 2019
Issue Date: 15 July 2019
DOI: https://doi.org/10.1007/s11042-019-7177-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Scene text detection with fully convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Unified Deep Neural Network for Scene Text Detection

Accurate Detection for Scene Texts with a Cascaded CNN Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene text detection with fully convolutional neural networks

Abstract

Access this article

Similar content being viewed by others

Natural Scene Text Detection Based on Deep Supervised Fully Convolutional Network

A Unified Deep Neural Network for Scene Text Detection

Accurate Detection for Scene Texts with a Cascaded CNN Networks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation