MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks

Zhong, Yikai; Zhang, Heng; Liu, Yanli; Qian, Qiang; Wang, Junwen

doi:10.1007/978-3-031-53401-0_18

Part of the book series: Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering ((LNICST,volume 553))

Included in the following conference series:

International Conference on 6GN for Future Wireless Networks

62 Accesses

Abstract

Text detection and recognition in natural scenes is an im- important task in computer vision. However, most of the texts in natural scenes are curved, and the text background is complex and diverse. In recent years, the text detection and text recognition models proposed have inherent defects, especially in applying many false positives, which usually leads to a decline in text detection and text recognition accuracy. To solve this problem, we propose a text detection and recognition model based on instance segmentation: MS TextSpotter (Mask Scoring TextSpotter), which is based on end-to-end training. First, we design a neural network based on Mask R-CNN. The neural network can achieve accurate text detection and recognition through semantic segmentation, especially for multi-directional and curved text in natural scenes. Second, the network block we designed combines text features and predictive masks to learn the quality of text masks and regresses the intersection ratio of text masks to improve the quality of character masks. The model was tested on ICDAR2013, IC- DAR2015, and Total-Text datasets. Experimental results show that the text detection recall rate increases by 1.4% and 0.2% under the first two datasets, and the experimental results under three different vocabularies- is also show that MS TextSpotter has higher accuracy than other text recognition models and is more suitable for curved text recognition in natural scenes.

This work was supported in part by the National Natural Science Foundation of China under Grant 61963017; in part by Shanghai Educational Science Research Project, China, under Grant C2022056; in part by Shanghai Science and Technology Program, China, under Grant 23010501000; in part by Humanities and Social Sciences of Ministry of Education Planning Fund, China, under Grant 22YJAZHA145.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Gupta, N., Jalal, A.S.: Traditional to transfer learning progression on scene text detection and recognition: a survey. Artif. Intell. Rev., 1–46 (2022)
Google Scholar
Lyu, P., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11218, pp. 67–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Chapter Google Scholar
Liu, W., Chen, C., Wong, K.K.: Char-Net: a character-aware neural network for distorted scene text recognition. In: Association for the Advancement of Artificial Intelligence, vol. 1, no. 2, pp. 4–12 (2018)
Google Scholar
Qiao, L., et al.: Text perceptron: towards end-to-end arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 11899–11907. arXiv preprint: arXiv:2002.06820 (2020)
Feng, W., et al.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2020)
Google Scholar
Zheng, Y., Li, Q., Liu, J., Liu, H., Li, G., Zhang, S.: A cascaded method for text detection in natural scene images. Neurocomputing 238, 307–315 (2017)
Article Google Scholar
Liu, W., et al.: TextBoxes: a fast text detector with a single deep neural network. AAAI Press (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Zhi, T., et al.: Detecting text in natural image with connectionist text proposal network. CoRR, abs/1609.03605 (2016)
Google Scholar
Qin, S., Manduchi, R.: Cascaded segmentation-detection networks for word-level text spotting. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)
Google Scholar
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. IEEE (2017)
Google Scholar
He, K., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Lin, T.Y., et al.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 2117–2125 (2017)
Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Google Scholar
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Google Scholar
Ch’Ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 935–942. IEEE (2017)
Google Scholar
Liao, M., et al.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4161–4167 (2017)
Google Scholar
Busta, M., Neumann, L., Matas, J.: Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2204–2212 (2017)
Google Scholar
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5246 (2017)
Google Scholar
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2015)
Article Google Scholar
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2550–2558 (2017)
Google Scholar
He, P., et al.: Single shot text detector with regional attention. In: International Conference on Computer Vision, pp. 3066–3074 (2017). https://vision.in.tum.de/data/datasets/rgbd-dataset

Download references

Author information

Authors and Affiliations

School of Electronic Information, Shanghai Dianji University, Shanghai, 201306, China
Yikai Zhong, Heng Zhang, Yanli Liu, Qiang Qian & Junwen Wang

Authors

Yikai Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Heng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yanli Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qiang Qian
View author publications
You can also search for this author in PubMed Google Scholar
Junwen Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Author Contributions Conceptualization, Y.L. and Y.Z.; Investigation, Y.Z., Y.L., H.Z.; Software, Q.Q.; Writing-Original Draft Preparation, Y.Z. and Y.L.; writing–review and editing, H.Z.; Funding Acquisition, Y.L. J.W.and H.Z.

Corresponding author

Correspondence to Yanli Liu .

Editor information

Editors and Affiliations

Shanghai Dianji University, Shanghai, China
Jingchao Li
Kanagawa University, Yokohama, Japan
Bin Zhang
Shanghai University of Electric Power, Yangpu, China
Yulong Ying

Ethics declarations

Conflicts of Interest The authors declare no conflict of interest.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, Y., Zhang, H., Liu, Y., Qian, Q., Wang, J. (2024). MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks. In: Li, J., Zhang, B., Ying, Y. (eds) 6GN for Future Wireless Networks. 6GN 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-031-53401-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-53401-0_18
Published: 09 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53400-3
Online ISBN: 978-3-031-53401-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks