Text kernel expansion for real-time scene text detection

He, Tao; Huang, Sheng; Tang, Wenhao; Liu, Bo

doi:10.1007/s10044-024-01352-2

Text kernel expansion for real-time scene text detection

Original Article
Published: 06 November 2024

Volume 27, article number 141, (2024)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Tao He¹,
Sheng Huang¹,
Wenhao Tang¹ &
…
Bo Liu²

206 Accesses
Explore all metrics

Abstract

Understanding textual information from natural images is fundamental for artificial intelligence systems to comprehend and interact with the environment. The precise detection of text is crucial for achieving these objectives. In this work, we propose a real-time arbitrary-shaped scene text detector named Text Kernel Expansion (TKE). TKE employs a segmentation module to segment text kernels, and then models them as control points. By employing a regression-based network, TKE refines those control points through an expansion procedure, avoiding the need for complex pixel-level post-processing and ensuring both efficiency and excellent performance. Additionally, we propose an Optimal Bipartite Graph Matching Loss that measures the matching error between the refined control points and the labeled vertices, which efficiently minimizes the global matching distance. Comprehensive testing on four standard benchmarks confirms that our method strikes an effective balance between accuracy and efficiency. The code of our proposed method can be found in: https://github.com/TankosTao/TKE.git. All related datasets are openly valuable and can be downloaded through our Github link.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

Article 05 March 2024

Text kernel calculation for arbitrary shape text detection

Article 30 June 2023

Arbitrary-shaped scene text detection by predicting distance map

Article 07 March 2022

References

Long S, He X, Yao C (2021) Scene text detection and recognition: the deep learning era. Int J Comput Vis 129(1):161–184
Article Google Scholar
Wang W, Xie E, Li X, et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9336–9345
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(4):1376–1405
Article Google Scholar
Tian Z, Shu M, Lyu P, et al (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4234–4243
Wang W, Xie E, Song X, et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 8440–8449
Wang W, Xie E, Li X et al (2021) Pan++: towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Trans Pattern Anal Mach Intell 44(9):5349–5367
Google Scholar
Liao M, Wan Z, Yao C, et al (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474–11481
Liao M, Zou Z, Wan Z et al (2022) Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans Pattern Anal Mach Intell 45(1):919–931
Article Google Scholar
Ling H, Gao J, Kar A, et al (2019) Fast interactive object annotation with curve-gcn. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5257–5266
Peng S, Jiang W, Pi H, et al (2020) Deep snake for real-time instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8533–8542
Liu Z, Liew JH, Chen X, et al (2021) Dance: A deep attentive contour model for efficient instance segmentation. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 345–354
Zhang T, Wei S, Ji S (2022) E2ec: An end-to-end contour-based method for high-quality high-speed instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4443–4452
Wang X, Jiang Y, Luo Z, et al (2019) Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6449–6458
Zhao M, Feng W, Yin F, et al (2020) Weakly-supervised arbitrary-shaped text detection with expectation-maximization algorithm. arXiv preprint arXiv:2012.00424
Zhang SX, Zhu X, Yang C, et al (2021) Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1305–1314
Wang Y, Xie H, Zha ZJ, et al (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 11753–11762
Zhao M, Feng W, Yin F et al (2022) Mixed-supervised scene text detection with expectation-maximization algorithm. IEEE Trans Image Process 31:5513–5528. https://doi.org/10.1109/TIP.2022.3197987
Article Google Scholar
Long S, Ruan J, Zhang W, et al (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp 20–36
Lyu P, Liao M, Yao C, et al (2018) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 67–83
Liao M, Pang G, Huang J, et al (2020) Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In: Proceedings of the European Conference on Computer Vision (ECCV)
Sheng T, Chen J, Lian Z (2021) Centripetaltext: an efficient text instance representation for scene text detection. Adv Neural Inf Process Syst 34:335–346
Google Scholar
Dai P, Zhang S, Zhang H, et al (2021) Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7393–7402
He K, Zhang X, Ren S, et al (2016) Identity mappings in deep residual networks. In: European conference on computer vision, Springer, pp 630–645
Zhao H, Shi J, Qi X, et al (2017) Pyramid scene parsing network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2881–2890
Liu R, Lehman J, Molino P et al (2018) An intriguing failing of convolutional neural networks and the coordconv solution. Adv Neural Inf Process Syst. https://doi.org/10.48550/arXiv.1807.03247
Article Google Scholar
Wang X, Zhang R, Kong T et al (2020) Solov2: dynamic and fast instance segmentation. Adv Neural Inf Process Syst 33:17721–17732
Google Scholar
Liu Y, Shen C, Jin L et al (2021) Abcnet v2: adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3107437
Article Google Scholar
Vatti BR (1992) A generic solution to polygon clipping. Commun ACM 35(7):56–63
Article Google Scholar
Kuhn HW (1955) The hungarian method for the assignment problem. Naval Res logist Q 2(1–2):83–97
Article MathSciNet Google Scholar
Suzuki S (1985) Topological structural analysis of digitized binary images by border following. Comput Vis Graph Image Process 30(1985):32–46. https://doi.org/10.1016/0734-189x(85)90016-7
Article Google Scholar
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
Nayef N, Yin F, Bizid I, et al (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), IEEE, pp 1454–1459
Karatzas D, Gomez-Bigorda L, Nicolaou A, et al (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), IEEE, pp 1156–1160
Yao C, Bai X, Liu W, et al (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, IEEE, pp 1083–1090
Ch’ng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), IEEE, pp 935–942
Yuliang L, Lianwen J, Shuaitao Z, et al (2017) Detecting curve text in the wild: New dataset and new solution. arXiv: Computer Vision and Pattern Recognition
Zhang SX, Zhu X, Hou JB, et al (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9699–9708
Zhu X, Hu H, Lin S, et al (2019) Deformable convnets v2: More deformable, better results. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9308–9316
Xie E, Zang Y, Shao S, et al (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, pp 9038–9045
Li J, Lin Y, Liu R, et al (2021) Rsca: Real-time segmentation-based context-aware scene text detection. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), https://doi.org/10.1109/cvprw53098.2021.00267,
Zhu Y, Chen J, Liang L, et al (2021) Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3123–3131
Tang J, Zhang W, Liu H, et al (2022) Few could be better than all: Feature sampling and grouping for scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4563–4572
Yu W, Liu Y, Hua W, et al (2023) Turning a clip model into a scene text detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6978–6988
Fu Z, Xie H, Fang S et al (2023) Learning pixel affinity pyramid for arbitrary-shaped text detection. ACM Trans Multimed Comput Commun Appl 19:1–24
Article Google Scholar
Li X, Yao X, Liu Y (2024) Combining swin transformer and attention-weighted fusion for scene text detection. Neural Process Lett 56(2):52
Article Google Scholar
He M, Liao M, Yang Z, et al (2021) Most: A multi-oriented scene text detector with localization refinement. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), https://doi.org/10.1109/cvpr46437.2021.00870
Huang L, Liao S, Yang W (2024) Dc-psenet: a novel scene text detection method integrating double resnet-based and changed channels recursive feature pyramid. Vis Comput 40(6):4473–4491
Article Google Scholar
Liu Y, Jin L, Xie Z, et al (2019) Tightness-aware evaluation protocol for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9612–9620

Download references

Acknowledgements

Reported research is partly supported by the National Natural Science Foundation of China under Grant 62176030, and the Natural Science Foundation of Chongqing under Grant cstc2021jcyj-msxmX0568.

Author information

Authors and Affiliations

School of Big Data and Software Engineering, Chongqing University, No.55 Daxuecheng South Rd., Shapingba, 401331, Chongqing, China
Tao He, Sheng Huang & Wenhao Tang
Walmart Global Tech, Francisco Bay Area, San Francisco, 94087, CA, USA
Bo Liu

Authors

Tao He
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Huang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhao Tang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Huang.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

He, T., Huang, S., Tang, W. et al. Text kernel expansion for real-time scene text detection. Pattern Anal Applic 27, 141 (2024). https://doi.org/10.1007/s10044-024-01352-2

Download citation

Received: 07 April 2024
Accepted: 09 October 2024
Published: 06 November 2024
DOI: https://doi.org/10.1007/s10044-024-01352-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text kernel expansion for real-time scene text detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CBNet: A Plug-and-Play Network for Segmentation-Based Scene Text Detection

Text kernel calculation for arbitrary shape text detection

Arbitrary-shaped scene text detection by predicting distance map

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now