Skip to main content
Log in

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

This paper proposes a multi-oriented text localization method in natural images suitable for real-time processing of high-definition video on portable and mobile devices. Our method is based on the connected components (CC) approach: first, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more context aware neural network classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the work presented here is execution speed: the CPU-only parallel implementation of the proposed method is capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. Furthermore, when benchmarked on the ICDAR 2013 Robust Reading and on the ICDAR 2015 Incidental Scene Text data sets, our system performs more than twice faster than the state-of-the-art, while still delivering competitive results in terms of precision and recall.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Physics constants have been removed.

  2. 2D vectors in the cross product extended to 3D by setting z to 0.

References

  1. Weinman, J.J., Learned-Miller, E., Hanson, A.R.: Scene text recognition using similarity and a lexicon with sparse belief propagation. IEEE Trans. Pattern Anal. Mach. Intell. 31(10), 1733–1746 (2009). https://doi.org/10.1109/TPAMI.2009.38

    Article  Google Scholar 

  2. Jiao, J., Ye, Q., Huang, Q.: A configurable method for multi-style license plate recognition. Pattern Recognit. 42(3), 358–369 (2009). https://doi.org/10.1016/j.patcog.2008.08.016

    Article  MATH  Google Scholar 

  3. Park, J., Lee, G., Kim, E., Lim, J., Kim, S., Yang, H., Lee, M., Hwang, S.: Automatic detection and recognition of Korean text in outdoor signboard images. Pattern Recognit. Lett. 31(12), 1728–1739 (2010). https://doi.org/10.1016/j.patrec.2010.05.024

    Article  Google Scholar 

  4. Liu, X., Wang, W., Zhu, T.: Extracting captions in complex background from videos. In: ICPR, pp. 3232–3235 (2010). https://doi.org/10.1109/ICPR.2010.790

  5. Yi, C., Tian, Y.: Assistive text reading from complex background for blind persons. In: CBDAR’11, pp. 15–28. Springer (2012). https://doi.org/10.1007/978-3-642-29364-1_2

  6. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In: ICCV, pp. 1457 –1464 (2011). https://doi.org/10.1109/ICCV.2011.6126402

  7. Tian, S., Pan, Y., Huang, C., Lu, S., Yu, K., Tan, C.L.: Text flow: a unified text detection system in natural scene images. In: ICCV, pp. 4651–4659 (2015). https://doi.org/10.1109/ICCV.2015.528

  8. Zhang, Z., Shen, W., Yao, C., Bai, X.: Symmetry-based text line detection in natural scenes. In: CVPR, pp. 2558–2567 (2015). https://doi.org/10.1109/CVPR.2015.7298871

  9. Qi, Z.K., Kimachi, M., Wu, Y., Aziwa, T.: Using Adaboost to detect and segment characters from natural scenes. In: Proceedings of CBDAR, ICDAR Workshop (2005)

  10. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: CVPR, pp. 2963–2970 (2010). https://doi.org/10.1109/CVPR.2010.5540041

  11. Neumann, L., Matas, J.: Text localization in real-world images using efficiently pruned exhaustive search. In: ICDAR, pp. 687–691 (2011). https://doi.org/10.1109/ICDAR.2011.144

  12. Koo, H.I., Kim, D.H.: Scene text detection via connected component clustering and nontext filtering. IEEE Trans. Image Process. 22(6), 2296–2305 (2013). https://doi.org/10.1109/TIP.2013.2249082

    Article  MathSciNet  MATH  Google Scholar 

  13. Yin, X.C., Yin, X., Huang, K., Hao, H.W.: Robust text detection in natural scene images. IEEE Trans. Pattern Anal. Mach. Intell. 36(5), 970–983 (2014). https://doi.org/10.1109/TPAMI.2013.182

    Article  Google Scholar 

  14. Lu, S., Chen, T., Tian, S., Lim, J.H., Tan, C.L.: Scene text extraction based on edges and support vector regression. IJDAR 18(2), 125–135 (2015). https://doi.org/10.1007/s10032-015-0237-z

    Article  Google Scholar 

  15. Qin, S., Manduchi, R.: A fast and robust text spotter. In: WACV, pp. 1–8 (2016). https://doi.org/10.1109/WACV.2016.7477663

  16. Tian, C., Xia, Y., Zhang, X., Gao, X.: Natural scene text detection with MC-MR candidate extraction and coarse-to-fine filtering. Neurocomputing 260, 112–122 (2017). https://doi.org/10.1016/j.neucom.2017.03.078

    Article  Google Scholar 

  17. Wei, Y., Shen, W., Zeng, D., Ye, L., Zhang, Z.: Multi-oriented text detection from natural scene images based on a CNN and pruning non-adjacent graph edges. Signal Process. Image Commun. 64, 89–98 (2018). https://doi.org/10.1016/j.image.2018.02.016

    Article  Google Scholar 

  18. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision—ECCV 2016, Lecture Notes in Computer Science, vol. 9912, pp. 56–72. , Springer International Publishing (2016). https://doi.org/10.1007/978-3-319-46484-8_4

  19. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3482–3490 (2017). https://doi.org/10.1109/CVPR.2017.371

  20. Tang, Y., Wu, X.: Scene text detection and segmentation based on cascaded convolution neural networks. IEEE Trans. Image Process. 26(3), 1509–1520 (2017). https://doi.org/10.1109/TIP.2017.2656474

    Article  MathSciNet  Google Scholar 

  21. He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 745–753 (2017). https://doi.org/10.1109/ICCV.2017.87

  22. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 4950–4959 (2017). https://doi.org/10.1109/ICCV.2017.529

  23. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: EAST: An efficient and accurate scene text detector. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2642–2651 (2017). https://doi.org/10.1109/CVPR.2017.283

  24. Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018). https://doi.org/10.1109/TMM.2018.2818020

    Article  Google Scholar 

  25. Lyu, P., Liao, M., Yao, C., Wu, W., Bai, X.: Mask TextSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, Lecture Notes in Computer Science, vol. 11218, pp 71–88. Springer International Publishing (2018). https://doi.org/10.1007/978-3-030-01264-9_5

  26. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision—ECCV 2018, Lecture Notes in Computer Science, pp. 19–35. Springer International Publishing

  27. Mohanty, S., Dutta, T., Gupta, H.P.: Recurrent global convolutional network for scene text detection. In: 2018 25th IEEE International Conference on Image Processing (ICIP), pp. 2750–2754 (2018). https://doi.org/10.1109/ICIP.2018.8451058

  28. Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018). https://doi.org/10.1109/TIP.2018.2825107

    Article  MathSciNet  MATH  Google Scholar 

  29. Jenq, J., Sahni, S.: Serial and parallel algorithms for the medial axis transform. IEEE Trans. Pattern Anal. Mach. Intell. 14(12), 1218–1224 (1992). https://doi.org/10.1109/34.177389

    Article  Google Scholar 

  30. Gironés, X., Julià, C.: Real-time text localization in natural scene images using a linear spatial filter. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, Kyoto, pp. 1261–1268 (2017). https://doi.org/10.1109/ICDAR.2017.208

  31. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., Shafait, F., Uchida, S., Valveny, E.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 156–1160 (2015).https://doi.org/10.1109/ICDAR.2015.7333942

  32. Liu, X., Fu, H., Jia, Y.: Gaussian mixture modeling and learning of neighboring characters for multilingual text extraction in images. Pattern Recognit. 41(2), 484–493 (2008). https://doi.org/10.1016/j.patcog.2007.06.004

    Article  MATH  Google Scholar 

  33. He, X., Song, Y., Zhang, Y.: A coarse-to-fine scene text detection method based on skeleton-cut detector and binary-tree-search based rectification. Pattern Recognit. Lett. 112, 27–33 (2018). https://doi.org/10.1016/j.patrec.2018.05.020

    Article  Google Scholar 

  34. Yi, C., Tian, Y.: Localizing text in scene images by boundary clustering, stroke segmentation, and string fragment classification. IEEE Trans. Image Process. 21(9), 4256–4268 (2012). https://doi.org/10.1109/TIP.2012.2199327

    Article  MathSciNet  MATH  Google Scholar 

  35. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: ACCV, 6494, pp. 770–783. Springer, Berlin (2010). https://doi.org/10.1007/978-3-642-19318-7_60

  36. Neumann, L., Matas, J.: Real-time scene text localization and recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3538 –3545 (2012). https://doi.org/10.1109/CVPR.2012.6248097

  37. Buta, M., Neumann, L., Matas, J.: FASText: efficient unconstrained scene text detector. In: ICCV, pp. 1206–1214 (2015). https://doi.org/10.1109/ICCV.2015.143

  38. Fabrizio, J., Robert-Seidowsky, M., Dubuisson, S., Calarasanu, S., Boissel, R.: TextCatcher: a method to detect curved and challenging text in natural scenes. IJDAR 19(2), 99–117 (2016). https://doi.org/10.1007/s10032-016-0264-4

    Article  Google Scholar 

  39. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004). https://doi.org/10.1016/j.imavis.2004.02.006

    Article  Google Scholar 

  40. Zamberletti, A., Noce, L., Gallo, I.: Text localization based on fast feature pyramids and multi-resolution maximally stable extremal regions. In: ACCV Workshops, 9009, pp. 91–105. Springer (2014). https://doi.org/10.1007/978-3-319-16631-5_7

  41. Neumann, L., Matas, J.: Efficient scene text localization and recognition with local character refinement. In: ICDAR, pp 746–750 (2015). https://doi.org/10.1109/ICDAR.2015.7333861

  42. Zhu, R., Mao, X.J., Zhu, Q.H., Li, N., Yang, Y.B.: Text detection based on convolutional neural networks with spatial pyramid pooling. In: ICIP, pp. 1032–1036 (2016). https://doi.org/10.1109/ICIP.2016.7532514

  43. Nistér, D., Stewénius, H.: Linear time maximally stable extremal regions. In: ECCV, pp. 183–196 (2008). https://doi.org/10.1007/978-3-540-88688-4_14

  44. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazàn, J.A., de las Heras, L.P.: ICDAR 2013 robust reading competition. In: ICDAR, pp. 1484–1493 (2013). https://doi.org/10.1109/ICDAR.2013.221

  45. Khurshid, K., Siddiqi, I., Faure, C., Vincent, N.: Comparison of Niblack inspired binarization methods for ancient documents. In: SPIE, vol. 7247, pp. 72470U–72470U–9 (2009). https://doi.org/10.1117/12.805827

  46. Niblack, W.: An Introduction to Digital Image Processing, First English edn. Prentice Hall, Upper Saddle River (1986)

    Google Scholar 

  47. Pan, Y.F., Hou, X., Liu, C.L.: A hybrid approach to detect and localize texts in natural scene images. IEEE Trans. Image Process. 20(3), 800–813 (2011). https://doi.org/10.1109/TIP.2010.2070803

    Article  MathSciNet  MATH  Google Scholar 

  48. Wang, L., Fan, W., He, Y., Sun, J., Katsuyama, Y., Hotta, Y.: Fast and accurate text detection in natural scene images with user-intention. In: ICPR, pp. 2920–2925 (2014). https://doi.org/10.1109/ICPR.2014.503

  49. Sauvola, J., Pietikäinen, M.: Adaptive document image binarization. Pattern Recognit. 33(2), 225–236 (2000). https://doi.org/10.1016/S0031-3203(99)00055-2

    Article  Google Scholar 

  50. Rodtook, S., Rangsanseri, Y.: Adaptive thresholding of document images based on Laplacian sign. In: Proceedings International Conference on Information Technology: Coding and Computing, pp. 501–505 (2001). https://doi.org/10.1109/ITCC.2001.918846

  51. Howe, N.R.: A Laplacian energy for document binarization. In: ICDAR, pp. 6–10 (2011). https://doi.org/10.1109/ICDAR.2011.11

  52. Zhang, Y., Lai, J.: Arbitrarily oriented text detection using geodesic distances between corners and skeletons. In: ICPR, pp. 1896–1899 (2012)

  53. Shivakumara, P., Phan, T.Q., Tan, C.L.: A Laplacian approach to multi-oriented text detection in video. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 412–419 (2011). https://doi.org/10.1109/TPAMI.2010.166

    Article  Google Scholar 

  54. Liu, Y., Zhang, D., Zhang, Y., Lin, S.: Real-time scene text detection based on stroke model. In: ICPR, pp. 3116–3120 (2014). https://doi.org/10.1109/ICPR.2014.537

  55. Schwarz, C., Teich, J., Welzl, E., Evans, B.: On Finding a Minimal Enclosing Parallelogram. Tech. Rep. TR-94-036, International Computer Science Institute, Berkeley (1994)

  56. Girones, X., Julia, C., Puig, D.: Full quadrant approximations for the arctangent function [tips and tricks]. IEEE Signal Process. Mag. 30(1), 130–135 (2013). https://doi.org/10.1109/MSP.2012.2219677

    Article  Google Scholar 

  57. Chen, H., Tsai, S., Schroth, G., Chen, D., Grzeszczuk, R., Girod, B.: Robust text detection in natural images with edge-enhanced maximally stable extremal regions. In: Image Processing (ICIP), 2011 18th IEEE International Conference, pp. 2609–2612 (2011). https://doi.org/10.1109/ICIP.2011.6116200

  58. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  59. Opitz, M., Diem, M., Fiel, S., Kleber, F., Sablatnig, R.: End-to-end text recognition using local ternary patterns, MSER and deep convolutional nets. In: 2014 11th IAPR International Workshop on Document Analysis Systems, pp. 186–190 (2014). https://doi.org/10.1109/DAS.2014.29

  60. Bernsen, J.: Dynamic thresholding of grey-level images. ICPR 2, 1251–1255 (1986)

    Google Scholar 

  61. Wolf, C., Jolion, J.M., Chassaing, F.: Text localization, enhancement and binarization in multimedia documents. In: ICPR, vol. 2, pp. 1037–1040 (2002). https://doi.org/10.1109/ICPR.2002.1048482

  62. Bradley, D., Roth, G.: Adaptive thresholding using the integral image. J. Graph. Tools 12(2), 13–21 (2007). https://doi.org/10.1080/2151237X.2007.10129236

    Article  Google Scholar 

  63. Lee, S., Cho, M.S., Jung, K., Kim, J.H.: Scene text extraction with edge constraint and text collinearity. In: ICPR, pp. 3983–3986 (2010). https://doi.org/10.1109/ICPR.2010.969

  64. Cho, H., Sung, M., Jun, B.: Canny text detector: fast and robust scene text localization algorithm. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3566–3573 (2016). https://doi.org/10.1109/CVPR.2016.388

  65. Wolf, C., Jolion, J.M.: Object count/area graphs for the evaluation of object detection and segmentation algorithms. IJDAR 8(4), 280–296 (2006). https://doi.org/10.1007/s10032-006-0014-0

    Article  Google Scholar 

  66. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: McIlraith, S.A., Weinberger, K.Q. (eds.) Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 6773–6780. AAAI Press (2018)

  67. Sosa, L.P., Lucas, S.M., Panaretos, A., Sosa, L., Tang, A., Wong, S., Young, R.: ICDAR 2003 robust reading competitions. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, vol. 2, pp. 682–687 (2003). https://doi.org/10.1109/ICDAR.2003.1227749

  68. Du, Y., Duan, G., Ai, H.: Context-based text detection in natural scenes. In: ICIP, pp. 1857–1860 (2012). https://doi.org/10.1109/ICIP.2012.6467245

  69. Mao, J., Li, H., Zhou, W., Yan, S., Tian, Q.: Scale based region growing for scene text detection. In: Proceedings of the 21st ACM International Conference on Multimedia, MM ’13, pp. 1007–1016. ACM, New York (2013). https://doi.org/10.1145/2502081.2502108

Download references

Acknowledgements

Work supported by the Spanish government under Grant TIN2016-80250-R.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xavier Gironés.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gironés, X., Julià, C. Real-time localization of multi-oriented text in natural scene images using a linear spatial filter. J Real-Time Image Proc 17, 1505–1525 (2020). https://doi.org/10.1007/s11554-019-00911-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11554-019-00911-9

Keywords

Navigation