Skip to main content

MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks

  • Conference paper
  • First Online:
6GN for Future Wireless Networks (6GN 2023)

Abstract

Text detection and recognition in natural scenes is an im- important task in computer vision. However, most of the texts in natural scenes are curved, and the text background is complex and diverse. In recent years, the text detection and text recognition models proposed have inherent defects, especially in applying many false positives, which usually leads to a decline in text detection and text recognition accuracy. To solve this problem, we propose a text detection and recognition model based on instance segmentation: MS TextSpotter (Mask Scoring TextSpotter), which is based on end-to-end training. First, we design a neural network based on Mask R-CNN. The neural network can achieve accurate text detection and recognition through semantic segmentation, especially for multi-directional and curved text in natural scenes. Second, the network block we designed combines text features and predictive masks to learn the quality of text masks and regresses the intersection ratio of text masks to improve the quality of character masks. The model was tested on ICDAR2013, IC- DAR2015, and Total-Text datasets. Experimental results show that the text detection recall rate increases by 1.4% and 0.2% under the first two datasets, and the experimental results under three different vocabularies- is also show that MS TextSpotter has higher accuracy than other text recognition models and is more suitable for curved text recognition in natural scenes.

This work was supported in part by the National Natural Science Foundation of China under Grant 61963017; in part by Shanghai Educational Science Research Project, China, under Grant C2022056; in part by Shanghai Science and Technology Program, China, under Grant 23010501000; in part by Humanities and Social Sciences of Ministry of Education Planning Fund, China, under Grant 22YJAZHA145.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Gupta, N., Jalal, A.S.: Traditional to transfer learning progression on scene text detection and recognition: a survey. Artif. Intell. Rev., 1–46 (2022)

    Google Scholar 

  2. Lyu, P., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11218, pp. 67–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5

    Chapter  Google Scholar 

  3. Liu, W., Chen, C., Wong, K.K.: Char-Net: a character-aware neural network for distorted scene text recognition. In: Association for the Advancement of Artificial Intelligence, vol. 1, no. 2, pp. 4–12 (2018)

    Google Scholar 

  4. Qiao, L., et al.: Text perceptron: towards end-to-end arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 11899–11907. arXiv preprint: arXiv:2002.06820 (2020)

  5. Feng, W., et al.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2020)

    Google Scholar 

  6. Zheng, Y., Li, Q., Liu, J., Liu, H., Li, G., Zhang, S.: A cascaded method for text detection in natural scene images. Neurocomputing 238, 307–315 (2017)

    Article  Google Scholar 

  7. Liu, W., et al.: TextBoxes: a fast text detector with a single deep neural network. AAAI Press (2017)

    Google Scholar 

  8. Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)

    Google Scholar 

  9. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

    Google Scholar 

  10. Zhi, T., et al.: Detecting text in natural image with connectionist text proposal network. CoRR, abs/1609.03605 (2016)

    Google Scholar 

  11. Qin, S., Manduchi, R.: Cascaded segmentation-detection networks for word-level text spotting. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)

    Google Scholar 

  12. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. IEEE (2017)

    Google Scholar 

  13. He, K., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  14. Lin, T.Y., et al.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 2117–2125 (2017)

    Google Scholar 

  15. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)

    Google Scholar 

  16. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  17. Ch’Ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 935–942. IEEE (2017)

    Google Scholar 

  18. Liao, M., et al.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4161–4167 (2017)

    Google Scholar 

  19. Busta, M., Neumann, L., Matas, J.: Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2204–2212 (2017)

    Google Scholar 

  20. Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5246 (2017)

    Google Scholar 

  21. Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2015)

    Article  Google Scholar 

  22. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2550–2558 (2017)

    Google Scholar 

  23. He, P., et al.: Single shot text detector with regional attention. In: International Conference on Computer Vision, pp. 3066–3074 (2017). https://vision.in.tum.de/data/datasets/rgbd-dataset

Download references

Author information

Authors and Affiliations

Authors

Contributions

Author Contributions Conceptualization, Y.L. and Y.Z.; Investigation, Y.Z., Y.L., H.Z.; Software, Q.Q.; Writing-Original Draft Preparation, Y.Z. and Y.L.; writing–review and editing, H.Z.; Funding Acquisition, Y.L. J.W.and H.Z.

Corresponding author

Correspondence to Yanli Liu .

Editor information

Editors and Affiliations

Ethics declarations

Conflicts of Interest The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, Y., Zhang, H., Liu, Y., Qian, Q., Wang, J. (2024). MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks. In: Li, J., Zhang, B., Ying, Y. (eds) 6GN for Future Wireless Networks. 6GN 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-031-53401-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-53401-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-53400-3

  • Online ISBN: 978-3-031-53401-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics