Robust Scene Text Detection for Multi-script Languages Using Deep Learning

Liu, Ruo-Ze; Sun, Xin; Xu, Hailiang; Shivakumara, Palaiahnakote; Su, Feng; Lu, Tong; Yang, Ruoyu

doi:10.1007/978-3-319-51811-4_27

Ruo-Ze Liu¹⁸,
Xin Sun¹⁸,
Hailiang Xu¹⁸,
Palaiahnakote Shivakumara¹⁹,
Feng Su¹⁸,
Tong Lu¹⁸ &
…
Ruoyu Yang¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10132))

Included in the following conference series:

International Conference on Multimedia Modeling

3258 Accesses

Abstract

Text detection in natural images has been a high demand for a lot real-life applications such as image retrieval and self-navigation. This work deals with the problem of robust text detection especially for multi-script in natural scene images. Unlike the existing works that consider multi-script characters as groups of text fragments, we consider them as non-connected components. Specifically, we firstly propose a novel representation named Linked Extremal Regions (LER) to extract full characters instead of fragments of scene characters. Secondly, we propose a two-stage convolution neural networks for discriminating multi-script texts in clutter background images for more robust text detection. Experimental results on three well-known datasets, namely, ICDAR 2011, 2013 and MSRA-TD500, demonstrate that the proposed method outperforms the state-of-the-art methods, and is also language independent.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: British Machine Vision Conference, pp. 384–393 (2002)
Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2963–2970 (2010)
Google Scholar
Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3538–3545 (2012)
Google Scholar
Neumann, L., Matas, J.: Scene text localization and recognition with oriented stroke detection. In: IEEE International Conference on Computer Vision, pp. 97–104 (2013)
Google Scholar
Huang, W., Qiao, Yu., Tang, X.: Robust scene text detection with convolution neural network induced MSER trees. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8692, pp. 497–511. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10593-2_33
Google Scholar
Xu, H., Su, F.: A robust hierarchical detection method for scene text based on convolutional neural networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2015)
Google Scholar
Sung, M.C., Jun, B., Cho, H., Kim, D.: Scene text detection with robust character candidate extraction method. In: International Conference on Document Analysis and Recognition, pp. 426–430 (2015)
Google Scholar
Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4321–4329 (2015)
Google Scholar
Yin, X.C., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 36(5), 970–983 (2014)
Article Google Scholar
Yin, X.C., Pei, W.Y., Zhang, J., Hao, H.: Multi-orientation scene text detection with adaptive clustering. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 37(9), 1930–1937 (2015)
Article Google Scholar
Yao, C., Bai, X., Liu, W., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090 (2012)
Google Scholar
Kang, L., Li, Y., Doermann, D.: Orientation robust text line detection in natural images. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4034–4041 (2014)
Google Scholar
Ren, S., Cao, X., Wei, Y., Sun, J.: Face alignment at 3000 FPS via regressing local binary features. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1685–1692 (2014)
Google Scholar

Download references

Acknowledgments

The work described in this paper was supported by the Natural Science Foundation of China under Grant Nos. 61672273, 61272218 and 61321491, the Science Foundation for Distinguished Young Scholars of Jiangsu under Grant No. BK20160021.

Author information

Authors and Affiliations

National Key Lab for Novel Software Technology, Nanjing University, Nanjing, China
Ruo-Ze Liu, Xin Sun, Hailiang Xu, Feng Su, Tong Lu & Ruoyu Yang
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia
Palaiahnakote Shivakumara

Authors

Ruo-Ze Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xin Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hailiang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Palaiahnakote Shivakumara
View author publications
You can also search for this author in PubMed Google Scholar
Feng Su
View author publications
You can also search for this author in PubMed Google Scholar
Tong Lu
View author publications
You can also search for this author in PubMed Google Scholar
Ruoyu Yang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Lu .

Editor information

Editors and Affiliations

CNRS–IRISA, Rennes, France
Laurent Amsaleg
Reykjavík University, Reykjavik, Iceland
Gylfi Þór Guðmundsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
Reykjavik University, Reykjavik, Ireland
Björn Þór Jónsson
National Institute of Informatics, Tokyo, Japan
Shin’ichi Satoh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, RZ. et al. (2017). Robust Scene Text Detection for Multi-script Languages Using Deep Learning. In: Amsaleg, L., Guðmundsson, G., Gurrin, C., Jónsson, B., Satoh, S. (eds) MultiMedia Modeling. MMM 2017. Lecture Notes in Computer Science(), vol 10132. Springer, Cham. https://doi.org/10.1007/978-3-319-51811-4_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-51811-4_27
Published: 31 December 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-51810-7
Online ISBN: 978-3-319-51811-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics