Skip to main content
Log in

Text kernel expansion for real-time scene text detection

  • Original Article
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Understanding textual information from natural images is fundamental for artificial intelligence systems to comprehend and interact with the environment. The precise detection of text is crucial for achieving these objectives. In this work, we propose a real-time arbitrary-shaped scene text detector named Text Kernel Expansion (TKE). TKE employs a segmentation module to segment text kernels, and then models them as control points. By employing a regression-based network, TKE refines those control points through an expansion procedure, avoiding the need for complex pixel-level post-processing and ensuring both efficiency and excellent performance. Additionally, we propose an Optimal Bipartite Graph Matching Loss that measures the matching error between the refined control points and the labeled vertices, which efficiently minimizes the global matching distance. Comprehensive testing on four standard benchmarks confirms that our method strikes an effective balance between accuracy and efficiency. The code of our proposed method can be found in: https://github.com/TankosTao/TKE.git. All related datasets are openly valuable and can be downloaded through our Github link.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184

    Article  Google Scholar 

  2. Wang W, Xie E, Li X, et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9336–9345

  3. Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405

    Article  Google Scholar 

  4. Tian Z, Shu M, Lyu P, et al (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4234–4243

  5. Wang W, Xie E, Song X, et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8440–8449

  6. Wang W, Xie E, Li X et al (2021) Pan++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans Pattern Anal Mach Intell 44(9):5349–5367

    Google Scholar 

  7. Liao M, Wan Z, Yao C, et al (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474–11481

  8. Liao M, Zou Z, Wan Z et al (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919–931

    Article  Google Scholar 

  9. Ling H, Gao J, Kar A, et al (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5257–5266

  10. Peng S, Jiang W, Pi H, et al (2020) Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8533–8542

  11. Liu Z, Liew JH, Chen X, et al (2021) Dance: A deep attentive contour model for efficient instance segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 345–354

  12. Zhang T, Wei S, Ji S (2022) E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4443–4452

  13. Wang X, Jiang Y, Luo Z, et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6449–6458

  14. Zhao M, Feng W, Yin F, et al (2020) Weakly-supervised arbitrary-shaped text detection with expectation-maximization algorithm. arXiv preprint arXiv:2012.00424

  15. Zhang SX, Zhu X, Yang C, et al (2021) Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1305–1314

  16. Wang Y, Xie H, Zha ZJ, et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762

  17. Zhao M, Feng W, Yin F et al (2022) Mixed-supervised scene text detection with expectation-maximization algorithm. IEEE Trans Image Process 31:5513–5528. https://doi.org/10.1109/TIP.2022.3197987

    Article  Google Scholar 

  18. Long S, Ruan J, Zhang W, et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36

  19. Lyu P, Liao M, Yao C, et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 67–83

  20. Liao M, Pang G, Huang J, et al (2020) Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: Proceedings of the European Conference on Computer Vision (ECCV)

  21. Sheng T, Chen J, Lian Z (2021) Centripetaltext: an efficient text instance representation for scene text detection. Adv Neural Inf Process Syst 34:335–346

    Google Scholar 

  22. Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402

  23. He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: European conference on computer vision, Springer, pp 630–645

  24. Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890

  25. Liu R, Lehman J, Molino P et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1807.03247

    Article  Google Scholar 

  26. Wang X, Zhang R, Kong T et al (2020) Solov2: dynamic and fast instance segmentation. Adv Neural Inf Process Syst 33:17721–17732

    Google Scholar 

  27. Liu Y, Shen C, Jin L et al (2021) Abcnet v2: adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3107437

    Article  Google Scholar 

  28. Vatti BR (1992) A generic solution to polygon clipping. Commun ACM 35(7):56–63

    Article  Google Scholar 

  29. Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res logist Q 2(1–2):83–97

    Article  MathSciNet  Google Scholar 

  30. Suzuki S (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1985):32–46. https://doi.org/10.1016/0734-189x(85)90016-7

    Article  Google Scholar 

  31. Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324

  32. Nayef N, Yin F, Bizid I, et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 1454–1459

  33. Karatzas D, Gomez-Bigorda L, Nicolaou A, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160

  34. Yao C, Bai X, Liu W, et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090

  35. Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942

  36. Yuliang L, Lianwen J, Shuaitao Z, et al (2017) Detecting curve text in the wild: New dataset and new solution. arXiv: Computer Vision and Pattern Recognition

  37. Zhang SX, Zhu X, Hou JB, et al (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9699–9708

  38. Zhu X, Hu H, Lin S, et al (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9308–9316

  39. Xie E, Zang Y, Shao S, et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045

  40. Li J, Lin Y, Liu R, et al (2021) Rsca: Real-time segmentation-based context-aware scene text detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), https://doi.org/10.1109/cvprw53098.2021.00267,

  41. Zhu Y, Chen J, Liang L, et al (2021) Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3123–3131

  42. Tang J, Zhang W, Liu H, et al (2022) Few could be better than all: Feature sampling and grouping for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4563–4572

  43. Yu W, Liu Y, Hua W, et al (2023) Turning a clip model into a scene text detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6978–6988

  44. Fu Z, Xie H, Fang S et al (2023) Learning pixel affinity pyramid for arbitrary-shaped text detection. ACM Trans Multimed Comput Commun Appl 19:1–24

    Article  Google Scholar 

  45. Li X, Yao X, Liu Y (2024) Combining swin transformer and attention-weighted fusion for scene text detection. Neural Process Lett 56(2):52

    Article  Google Scholar 

  46. He M, Liao M, Yang Z, et al (2021) Most: A multi-oriented scene text detector with localization refinement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), https://doi.org/10.1109/cvpr46437.2021.00870

  47. Huang L, Liao S, Yang W (2024) Dc-psenet: a novel scene text detection method integrating double resnet-based and changed channels recursive feature pyramid. Vis Comput 40(6):4473–4491

    Article  Google Scholar 

  48. Liu Y, Jin L, Xie Z, et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620

Download references

Acknowledgements

Reported research is partly supported by the National Natural Science Foundation of China under Grant 62176030, and the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0568.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sheng Huang.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, T., Huang, S., Tang, W. et al. Text kernel expansion for real-time scene text detection. Pattern Anal Applic 27, 141 (2024). https://doi.org/10.1007/s10044-024-01352-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01352-2

Keywords