Scene Text Detection with Cascaded Filtering and Grouping Modules

Zhang, Lifei; Xiang, Xinguang

doi:10.1007/978-981-10-8530-7_46

Lifei Zhang¹² &
Xinguang Xiang¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 819))

Included in the following conference series:

International Conference on Internet Multimedia Computing and Service

1386 Accesses

Abstract

In this paper, we present a new scene text detection approach with cascaded filtering and grouping modules. Firstly, a coarse-to-fine distance based pair validation scheme is proposed to determine the pairwise relations of character candidates after the extraction and filtering of Extremal Regions. Secondly, an additional module is added to detect text lines with single character or two characters behind the text lines’ grouping module. Thirdly, a text-line-level classifier based on the similarity of characters is designed to exclude non-text objects. Experimental results on ICDAR 2011 and ICDAR 2013 robust reading competition datasets demonstrate that our method yields state-of-the-art performance both in recall and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 107.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Hong, R., Yang, Y., Wang, M., Hua, X.S.: Learning visual semantic relationships for efficient visual retrieval. IEEE Trans. Big Data 1(4), 152–161 (2015)
Article Google Scholar
Hong, R., Zhang, L., Zhang, C., Zimmermann, R.: Flickr circles: aesthetic tendency discovery by multi-view regularized topic modeling. IEEE Trans. Multimedia 18(8), 1555–1567 (2016)
Article Google Scholar
Hong, R., Hu, Z., Wang, R., Wang, M., Tao, D.: Multi-view object retrieval via multi-scale topic models. IEEE Trans. Image Process. 25(12), 5814–5827 (2016)
Article MathSciNet Google Scholar
Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimedia 17(11), 1989–1999 (2015)
Article Google Scholar
Bissacco, A., Cummins, M., Netzer, Y., Neven, H.: PhotoOCR: reading text in uncontrolled conditions. In: IEEE International Conference on Computer Vision, pp. 785–792 (2014)
Google Scholar
Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S.: Traffic-sign detection and classification in the wild. In: Computer Vision and Pattern Recognition, pp. 2110–2118 (2016)
Google Scholar
Ye, Q., Doermann, D.: Text detection and recognition in imagery: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 37(7), 1480–1500 (2015)
Article Google Scholar
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: Computer Vision and Pattern Recognition, pp. 2558–2567 (2015)
Google Scholar
Wang, K., Belongie, S.: Word spotting in the wild. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 591–604. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_43
Chapter Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
Google Scholar
Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Press (2013)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)
Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011)
Article MathSciNet MATH Google Scholar
Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 687–691. IEEE (2011)
Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition (ICDAR), pp. 1484–1493. IEEE (2013)
Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 1457–1464. IEEE (2011)
Google Scholar
Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Lim Tan, C.: Text flow: a unified text detection system in natural scene images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4651–4659 (2015)
Google Scholar
Li, Z., Liu, J., Tang, J., Lu, H.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)
Article Google Scholar
Shahab, A., Shafait, F., Dengel, A.: ICDAR 2011 robust reading competition challenge 2: reading text in scene images. In: 2011 International Conference on Document Analysis and Recognition (ICDAR), pp. 1491–1496. IEEE (2011)
Google Scholar
Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int. J. Doc. Anal. Recogn. (IJDAR) 8(4), 280–296 (2006)
Article Google Scholar
Yin, X., Yin, X.C., Hao, H.W., Iqbal, K.: Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost. In: 2012 21st International Conference on Pattern Recognition (ICPR), pp. 725–728. IEEE (2012)
Google Scholar
Li, Y., Shen, C., Jia, W., Hengel, A.V.D.: Leveraging surrounding context for scene text detection. In: IEEE International Conference on Image Processing, pp. 2264–2268 (2013)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China under Grant 61301106, 61327013 and U1611461.

Author information

Authors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Lifei Zhang & Xinguang Xiang

Authors

Lifei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xinguang Xiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinguang Xiang .

Editor information

Editors and Affiliations

Multimedia Communications Department, EURECOM, Sophia Antipolis, France
Benoit Huet
Shandong University , Qingdao, China
Liqiang Nie
Hefei University of Technology , Hefei, China
Richang Hong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, L., Xiang, X. (2018). Scene Text Detection with Cascaded Filtering and Grouping Modules. In: Huet, B., Nie, L., Hong, R. (eds) Internet Multimedia Computing and Service. ICIMCS 2017. Communications in Computer and Information Science, vol 819. Springer, Singapore. https://doi.org/10.1007/978-981-10-8530-7_46

Download citation

DOI: https://doi.org/10.1007/978-981-10-8530-7_46
Published: 01 March 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8529-1
Online ISBN: 978-981-10-8530-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics