A real-time and effective text detection method for multi-scale and fuzzy text

Tong, Guoxiang; Dong, Ming; Song, Yan

doi:10.1007/s11554-023-01267-x

A real-time and effective text detection method for multi-scale and fuzzy text

Original Research Paper
Published: 09 February 2023

Volume 20, article number 13, (2023)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Guoxiang Tong¹,
Ming Dong¹ &
Yan Song¹

349 Accesses
2 Citations
Explore all metrics

Abstract

The text in the natural scene can be in various forms, dynamic blur and geometric perspective greatly affect the efficiency of text detection. Given the above situation, a real-time and effective text detection method is proposed to detect the multi-scale and fuzzy text. This method applies a convolutional attention mechanism to the feature extraction backbone to obtain more valuable text feature maps. To fully utilize the precise text location signals of the low-level features, a bottom-up path augmentation is used simultaneously. Besides, a few layers of the Resnet-50 backbone are cancelled to further shorten information communication path for balancing the speed and accuracy of detection. For text detection results, the four vertex coordinate values of the text boxes are regressed with the assistance of CIoU loss and shrinkage of text labels. Our model can effectively process an image in the fastest time of 112 ms and has a higher comprehensive indicator value than the other comparative models in ICDAR 2013, ICDAR 2015, and MSRA-TD500 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MTMFNet: multi-threshold and multi-scale feature fusion network for text detection

Article 30 January 2025

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

Article 07 August 2023

References

Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, vol. 32, No. 1, pp. 6773–6780 (2018). https://doi.org/10.1609/aaai.v32i1.12269
Deng, G., Ming, Y., Xue, J.H.: Rfrn: a recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453, 465–481 (2021)
Article Google Scholar
Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)
Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7029–7038 (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778 (2016)
He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 745–753 (2017)
Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: Wordsup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 4950–4959 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)
Joan, S., Valli, S.: An enhanced text detection technique for the visually impaired to read text. Inf. Syst. Front. 19(5), 1039–1056 (2017)
Article Google Scholar
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Kim, Y., Kang, B.N., Kim, D.: San: learning relationship between convolutional features for multi-scale object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 316–331 (2018)
Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 169–185 (2018)
Li, D., Hu, J., Wang, C., Li, X., She, Q., Zhu, L., Zhang, T., Chen, Q.: Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Kuala Lumpur, Malaysia, pp. 12321–12330 (2021)
Li, X., Liu, J., Zhang, G., Huang, Y., Zheng, Y., Zhang, S.: Learning to predict more accurate text instances for scene text detection. Neurocomputing 449, 455–463 (2021)
Article Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 510–519 (2019)
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 4161–4167 (2017)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 2117–2125 (2017)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Liu, J., Zhong, Q., Yuan, Y., Su, H., Du, B.: Semitext: scene text detection with semi-supervised learning. Neurocomputing 407, 343–353 (2020)
Article Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 8759–8768 (2018)
Ma, J., Wan, H., Wang, J., Xia, H., Bai, C.: An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J. Real Time Image Process. 18(6), 1965–1978 (2021)
Article Google Scholar
Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)
Article Google Scholar
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2015)
Article Google Scholar
Nguyen Dinh, C., Delalandre, M., Conte, D., et al.: Fast rt-log operator for scene text detection. J. Real Time Image Process. 18(1), 19–36 (2021)
Article Google Scholar
Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Kuala Lumpur, Malaysia, pp. 10213–10224 (2021)
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. Adv. Neural Inf. Process. Syst. 32, 68–80 (2019)
Ren, K., Huang, L., Fan, C., Han, H., Deng, H.: Real-time traffic sign detection network using ds-detnet and lite fusion fpn. J. Real Time Image Process. 18(6), 2181–2191 (2021)
Article Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 2550–2558 (2017)
Singh, J.P., Kumar, A., Rana, N.P., Dwivedi, Y.K.: Attention-based lstm network for rumor veracity estimation of tweets. Inf. Syst. Front. 24, 459–474 (2020)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017)
Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 3156–3164 (2017)
Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 9336–9345 (2019)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 7794–7803 (2018)
Woo, S., Hwang, S., Kweon, I.S.: Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1093–1102. IEEE (2018)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)
Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9126–9136 (2019)
Yang, Y., Song, H., Sun, S., Zhang, W., Chen, Y., Rakal, L., Fang, Y.: A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link. J. Real Time Image Process. 18(4), 1261–1274 (2021)
Article Google Scholar
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 1083–1090. IEEE (2012)
Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)
Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020)
Yu, H., Wang, X., Shao, Y., Qin, F., Chen, B., Gong, S.: Research on license plate location and recognition in complex environment. J. Real Time Image Process. 19, 823–837 (2022)
Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)
Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, pp. 4159–4167 (2016)
Zhao, Q., Peng, Q., Zhuang, Y.: Lane line detection based on the codec structure of the attention mechanism. J. Real Time Image Process. 19, 715–726 (2022)
Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)
Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 528–537 (2018)
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 5551–5560 (2017)
Zhu, M., Han, K., Yu, C., Wang, Y.: Dynamic feature pyramid networks for object detection. arXiv preprint arXiv:2012.00779 (2020)
Zhu, X., Cheng, D., Zhang, Z., Lin, S., Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 6688–6697 (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China [No. 2018YFB1700902].

Author information

Authors and Affiliations

Department of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, 200093, China
Guoxiang Tong, Ming Dong & Yan Song

Authors

Guoxiang Tong
View author publications
You can also search for this author inPubMed Google Scholar
Ming Dong
View author publications
You can also search for this author inPubMed Google Scholar
Yan Song
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Guoxiang Tong.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tong, G., Dong, M. & Song, Y. A real-time and effective text detection method for multi-scale and fuzzy text. J Real-Time Image Proc 20, 13 (2023). https://doi.org/10.1007/s11554-023-01267-x

Download citation

Received: 21 July 2022
Accepted: 16 December 2022
Published: 09 February 2023
DOI: https://doi.org/10.1007/s11554-023-01267-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time and effective text detection method for multi-scale and fuzzy text

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

MTMFNet: multi-threshold and multi-scale feature fusion network for text detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

TextFuse: Fusing Deep Scene Text Detection Models for Enhanced Performance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now