Skip to main content
Log in

A real-time and effective text detection method for multi-scale and fuzzy text

  • Original Research Paper
  • Published:
Journal of Real-Time Image Processing Aims and scope Submit manuscript

Abstract

The text in the natural scene can be in various forms, dynamic blur and geometric perspective greatly affect the efficiency of text detection. Given the above situation, a real-time and effective text detection method is proposed to detect the multi-scale and fuzzy text. This method applies a convolutional attention mechanism to the feature extraction backbone to obtain more valuable text feature maps. To fully utilize the precise text location signals of the low-level features, a bottom-up path augmentation is used simultaneously. Besides, a few layers of the Resnet-50 backbone are cancelled to further shorten information communication path for balancing the speed and accuracy of detection. For text detection results, the four vertex coordinate values of the text boxes are regressed with the assistance of CIoU loss and shrinkage of text labels. Our model can effectively process an image in the fastest time of 112 ms and has a higher comprehensive indicator value than the other comparative models in ICDAR 2013, ICDAR 2015, and MSRA-TD500 datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, vol. 32, No. 1, pp. 6773–6780 (2018). https://doi.org/10.1609/aaai.v32i1.12269

  2. Deng, G., Ming, Y., Xue, J.H.: Rfrn: a recurrent feature refinement network for accurate and efficient scene text detection. Neurocomputing 453, 465–481 (2021)

    Article  Google Scholar 

  3. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)

  4. Ghiasi, G., Lin, T.Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7029–7038 (2019)

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, pp. 770–778 (2016)

  6. He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 745–753 (2017)

  7. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: Wordsup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, pp. 4950–4959 (2017)

  8. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)

  9. Jaderberg, M., Simonyan, K., Zisserman, A., et al.: Spatial transformer networks. Adv. Neural Inf. Process. Syst. 28, 2017–2025 (2015)

  10. Joan, S., Valli, S.: An enhanced text detection technique for the visually impaired to read text. Inf. Syst. Front. 19(5), 1039–1056 (2017)

    Article  Google Scholar 

  11. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

  12. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)

  13. Kim, Y., Kang, B.N., Kim, D.: San: learning relationship between convolutional features for multi-scale object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 316–331 (2018)

  14. Kong, T., Sun, F., Tan, C., Liu, H., Huang, W.: Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, pp. 169–185 (2018)

  15. Li, D., Hu, J., Wang, C., Li, X., She, Q., Zhu, L., Zhang, T., Chen, Q.: Involution: inverting the inherence of convolution for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Kuala Lumpur, Malaysia, pp. 12321–12330 (2021)

  16. Li, X., Liu, J., Zhang, G., Huang, Y., Zheng, Y., Zhang, S.: Learning to predict more accurate text instances for scene text detection. Neurocomputing 449, 455–463 (2021)

    Article  Google Scholar 

  17. Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 510–519 (2019)

  18. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-first AAAI Conference on Artificial Intelligence, San Francisco, USA, pp. 4161–4167 (2017)

  19. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 2117–2125 (2017)

  20. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  21. Liu, J., Zhong, Q., Yuan, Y., Su, H., Du, B.: Semitext: scene text detection with semi-supervised learning. Neurocomputing 407, 343–353 (2020)

    Article  Google Scholar 

  22. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 8759–8768 (2018)

  23. Ma, J., Wan, H., Wang, J., Xia, H., Bai, C.: An improved one-stage pedestrian detection method based on multi-scale attention feature extraction. J. Real Time Image Process. 18(6), 1965–1978 (2021)

    Article  Google Scholar 

  24. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  25. Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2015)

    Article  Google Scholar 

  26. Nguyen Dinh, C., Delalandre, M., Conte, D., et al.: Fast rt-log operator for scene text detection. J. Real Time Image Process. 18(1), 19–36 (2021)

    Article  Google Scholar 

  27. Qiao, S., Chen, L.C., Yuille, A.: Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Kuala Lumpur, Malaysia, pp. 10213–10224 (2021)

  28. Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. Adv. Neural Inf. Process. Syst. 32, 68–80 (2019)

  29. Ren, K., Huang, L., Fan, C., Han, H., Deng, H.: Real-time traffic sign detection network using ds-detnet and lite fusion fpn. J. Real Time Image Process. 18(6), 2181–2191 (2021)

    Article  Google Scholar 

  30. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 2550–2558 (2017)

  31. Singh, J.P., Kumar, A., Rana, N.P., Dwivedi, Y.K.: Attention-based lstm network for rumor veracity estimation of tweets. Inf. Syst. Front. 24, 459–474 (2020)

  32. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017)

  33. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 3156–3164 (2017)

  34. Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, California, USA, pp. 9336–9345 (2019)

  35. Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 7794–7803 (2018)

  36. Woo, S., Hwang, S., Kweon, I.S.: Stairnet: Top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1093–1102. IEEE (2018)

  37. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018)

  38. Xing, L., Tian, Z., Huang, W., Scott, M.R.: Convolutional character networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9126–9136 (2019)

  39. Yang, Y., Song, H., Sun, S., Zhang, W., Chen, Y., Rakal, L., Fang, Y.: A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link. J. Real Time Image Process. 18(4), 1261–1274 (2021)

    Article  Google Scholar 

  40. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 1083–1090. IEEE (2012)

  41. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)

  42. Ye, J., Chen, Z., Liu, J., Du, B.: Textfusenet: scene text detection with richer fused features. In: IJCAI, vol. 20, pp. 516–522 (2020)

  43. Yu, H., Wang, X., Shao, Y., Qin, F., Chen, B., Gong, S.: Research on license plate location and recognition in complex environment. J. Real Time Image Process. 19, 823–837 (2022)

  44. Yu, J., Jiang, Y., Wang, Z., Cao, Z., Huang, T.: Unitbox: an advanced object detection network. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520 (2016)

  45. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA, pp. 4159–4167 (2016)

  46. Zhao, Q., Peng, Q., Zhuang, Y.: Lane line detection based on the codec structure of the attention mechanism. J. Real Time Image Process. 19, 715–726 (2022)

  47. Zheng, Z., Wang, P., Ren, D., Liu, W., Ye, R., Hu, Q., Zuo, W.: Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans. Cybern. 52(8), 8574–8586 (2021)

  48. Zhou, P., Ni, B., Geng, C., Hu, J., Xu, Y.: Scale-transferrable object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, pp. 528–537 (2018)

  49. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Hawaii, USA, pp. 5551–5560 (2017)

  50. Zhu, M., Han, K., Yu, C., Wang, Y.: Dynamic feature pyramid networks for object detection. arXiv preprint arXiv:2012.00779 (2020)

  51. Zhu, X., Cheng, D., Zhang, Z., Lin, S., Dai, J.: An empirical study of spatial attention mechanisms in deep networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Korea (South), pp. 6688–6697 (2019)

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China [No. 2018YFB1700902].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoxiang Tong.

Ethics declarations

Conflict of interest

All the authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tong, G., Dong, M. & Song, Y. A real-time and effective text detection method for multi-scale and fuzzy text. J Real-Time Image Proc 20, 13 (2023). https://doi.org/10.1007/s11554-023-01267-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11554-023-01267-x

Keywords

Navigation