Accurate Detection for Scene Texts with a Cascaded CNN Networks

Li, Jianjun; Wang, Chenyan; Luo, Zhenxing; Tang, Zhuo; Li, Haojie

doi:10.1007/978-3-319-73600-6_4

Jianjun Li²¹,
Chenyan Wang²¹,
Zhenxing Luo²²,
Zhuo Tang²² &
…
Haojie Li²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10705))

Included in the following conference series:

International Conference on Multimedia Modeling

2774 Accesses

Abstract

We propose an algorithm of text detection to accurately and reliably determine the bounding regions of texts in a natural scene. The cascaded convolutional neural networks are aggregated in our system in order to obtain accurate Precision, Recall and F-score (PRF) of text detection. The first fully convolutional network, as a coarse detector, is in charge of detecting and segmenting areas of text-like. And the second network filters the segment blocks of non-text and accurately determines each text lines of the segment blocks. In order to make best use of the advantages of two networks, we proposed an intermediate-processing mechanism. The whole system has powerful capability of detecting those squeezed lines with very tiny words and also those texts with different sizes, especially for small size text. Our experimental system is based on a Titan X GPU and achieves precision of 0.92, recall of 0.83 and F-score of 0.87, which is listed in the 22nd place among all the published results of the ICDAR 2013 Focused Scene Text dataset benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Tu, Z., Ma, Y., Liu, W., et al.: Detecting texts of arbitrary orientations in natural images. In: Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545. IEEE Computer Society (2012)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Synthetic data and artificial neural networks for natural scene text recognition. Eprint Arxiv (2014)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., et al.: Reading text in the wild with convolutional neural networks. Int. J. Comput. Vis. 116(1), 1–20 (2016)
Article MathSciNet Google Scholar
Liao, M., Shi, B., Bai, X., et al.: TextBoxes: a fast text detector with a single deep neural network (2016)
Google Scholar
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A.C.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324. IEEE Computer Society (2016)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: Computer Vision and Pattern Recognition, pp. 779–788. IEEE (2016)
Google Scholar
Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4
Chapter Google Scholar
Zhong, Z., Jin, L., Zhang, S., et al.: DeepText: a unified framework for text proposal generation and text detection in natural images. Archit. Sci. 12, 1–18 (2015)
Google Scholar
Qin, S., Manduchi, R.: Cascaded segmentation-detection networks for word-level text spotting (2017)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE Computer Society (2015)
Google Scholar
Zhang, Z., Zhang, C., Shen, W., et al.: Multi-oriented Text Detection with Fully Convolutional Networks. In: Computer Vision and Pattern Recognition. IEEE (2016)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., et al.: ICDAR 2013 robust reading competition. In: International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Article Google Scholar
Karatzas, D., Gomez-Bigorda, L., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160 (2015)
Google Scholar
Zhu, Y., Yao, C., Bai, X.: Scene text detection and recognition: recent advances and future trends. Front. Comput. Sci. 10(1), 19–36 (2016)
Article Google Scholar
Wang, S., Fu, C., Li, Q.: Text detection in natural scene image: a survey. In: Huang, X.-L. (ed.) MLICOM 2016. LNICST, vol. 183, pp. 257–264. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-52730-7_26
Chapter Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection, pp. 1395–1403 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Article Google Scholar
He, P., Huang, W., Qiao, Y., et al.: Reading scene text in deep convolutional sequences. 116(1), 3501–3508 (2015)
Google Scholar
Buta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: IEEE International Conference on Computer Vision, pp. 1206–1214. IEEE (2015)
Google Scholar
Neumann, L., Matas, J.: Efficient Scene text localization and recognition with local character refinement. In: International Conference on Document Analysis and Recognition, pp. 746–750. IEEE (2015)
Google Scholar
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2016)
Article Google Scholar
Tian, S., Pan, Y., Huang, C., et al.: Text flow: a unified text detection system in natural scene images, pp. 4651–4659 (2016)
Google Scholar
Zhang, Z., Shen, W., Yao, C., et al.: Symmetry-based text line detection in natural scenes. In: Computer Vision and Pattern Recognition, pp. 2558–2567. IEEE (2015)
Google Scholar
He, T., Huang, W., Qiao, Y., et al.: Text-attentional convolutional neural network for scene text detection. IEEE Trans. Image Process. 25(6), 2529–2541 (2016)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, Hangzhou Dianzi University, Hangzhou, 310018, China
Jianjun Li & Chenyan Wang
The 36th Institute of China Electronics Technology Group Corporation, Jiaxing, China
Zhenxing Luo & Zhuo Tang
School of Software, Dalian University of Technology, Dalian, 140023, China
Haojie Li

Authors

Jianjun Li
View author publications
You can also search for this author in PubMed Google Scholar
Chenyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxing Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhuo Tang
View author publications
You can also search for this author in PubMed Google Scholar
Haojie Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianjun Li .

Editor information

Editors and Affiliations

Alpen-Adria-Universität Klagenfurt, Klagenfurt, Austria
Klaus Schoeffmann
Chulalongkorn University, Bangkok, Thailand
Thanarat H. Chalidabhongse
City University of Hong Kong, Hong Kong, China
Chong Wah Ngo
Chulalongkorn University, Bangkok, Thailand
Supavadee Aramvith
Dublin City University, Dublin, Ireland
Noel E. O’Connor
Gwangju Institute of Science and Technology, Gwangju, Korea (Republic of)
Yo-Sung Ho
Tampere University of Technology, Tampere, Finland
Moncef Gabbouj
Rutgers University, Piscataway, New Jersey, USA
Ahmed Elgammal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, J., Wang, C., Luo, Z., Tang, Z., Li, H. (2018). Accurate Detection for Scene Texts with a Cascaded CNN Networks. In: Schoeffmann, K., et al. MultiMedia Modeling. MMM 2018. Lecture Notes in Computer Science(), vol 10705. Springer, Cham. https://doi.org/10.1007/978-3-319-73600-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-73600-6_4
Published: 13 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73599-3
Online ISBN: 978-3-319-73600-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accurate Detection for Scene Texts with a Cascaded CNN Networks