Abstract
Text detection and recognition in natural scenes is an im- important task in computer vision. However, most of the texts in natural scenes are curved, and the text background is complex and diverse. In recent years, the text detection and text recognition models proposed have inherent defects, especially in applying many false positives, which usually leads to a decline in text detection and text recognition accuracy. To solve this problem, we propose a text detection and recognition model based on instance segmentation: MS TextSpotter (Mask Scoring TextSpotter), which is based on end-to-end training. First, we design a neural network based on Mask R-CNN. The neural network can achieve accurate text detection and recognition through semantic segmentation, especially for multi-directional and curved text in natural scenes. Second, the network block we designed combines text features and predictive masks to learn the quality of text masks and regresses the intersection ratio of text masks to improve the quality of character masks. The model was tested on ICDAR2013, IC- DAR2015, and Total-Text datasets. Experimental results show that the text detection recall rate increases by 1.4% and 0.2% under the first two datasets, and the experimental results under three different vocabularies- is also show that MS TextSpotter has higher accuracy than other text recognition models and is more suitable for curved text recognition in natural scenes.
This work was supported in part by the National Natural Science Foundation of China under Grant 61963017; in part by Shanghai Educational Science Research Project, China, under Grant C2022056; in part by Shanghai Science and Technology Program, China, under Grant 23010501000; in part by Humanities and Social Sciences of Ministry of Education Planning Fund, China, under Grant 22YJAZHA145.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gupta, N., Jalal, A.S.: Traditional to transfer learning progression on scene text detection and recognition: a survey. Artif. Intell. Rev., 1–46 (2022)
Lyu, P., et al.: Mask textspotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. Lecture Notes in Computer Science(), vol. 11218, pp. 67–83. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_5
Liu, W., Chen, C., Wong, K.K.: Char-Net: a character-aware neural network for distorted scene text recognition. In: Association for the Advancement of Artificial Intelligence, vol. 1, no. 2, pp. 4–12 (2018)
Qiao, L., et al.: Text perceptron: towards end-to-end arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 7, pp. 11899–11907. arXiv preprint: arXiv:2002.06820 (2020)
Feng, W., et al.: TextDragon: an end-to-end framework for arbitrary shaped text spotting. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE (2020)
Zheng, Y., Li, Q., Liu, J., Liu, H., Li, G., Zhang, S.: A cascaded method for text detection in natural scene images. Neurocomputing 238, 307–315 (2017)
Liu, W., et al.: TextBoxes: a fast text detector with a single deep neural network. AAAI Press (2017)
Girshick, R.: Fast R-CNN. In: IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Zhi, T., et al.: Detecting text in natural image with connectionist text proposal network. CoRR, abs/1609.03605 (2016)
Qin, S., Manduchi, R.: Cascaded segmentation-detection networks for word-level text spotting. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR) (2017)
Zhou, X., et al.: EAST: an efficient and accurate scene text detector. IEEE (2017)
He, K., et al.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Lin, T.Y., et al.: Feature pyramid networks for object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, pp. 2117–2125 (2017)
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2315–2324 (2016)
Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Ch’Ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), pp. 935–942. IEEE (2017)
Liao, M., et al.: Textboxes: a fast text detector with a single deep neural network. In: Thirty-First AAAI Conference on Artificial Intelligence, San Francisco, California, USA, pp. 4161–4167 (2017)
Busta, M., Neumann, L., Matas, J.: Deep textspotter: an end-to-end trainable scene text localization and recognition framework. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2204–2212 (2017)
Li, H., Wang, P., Shen, C.: Towards end-to-end text spotting with convolutional recurrent neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5238–5246 (2017)
Neumann, L., Matas, J.: Real-time lexicon-free scene text localization and recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38(9), 1872–1885 (2015)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2550–2558 (2017)
He, P., et al.: Single shot text detector with regional attention. In: International Conference on Computer Vision, pp. 3066–3074 (2017). https://vision.in.tum.de/data/datasets/rgbd-dataset
Author information
Authors and Affiliations
Contributions
Author Contributions Conceptualization, Y.L. and Y.Z.; Investigation, Y.Z., Y.L., H.Z.; Software, Q.Q.; Writing-Original Draft Preparation, Y.Z. and Y.L.; writing–review and editing, H.Z.; Funding Acquisition, Y.L. J.W.and H.Z.
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Conflicts of Interest The authors declare no conflict of interest.
Rights and permissions
Copyright information
© 2024 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Zhong, Y., Zhang, H., Liu, Y., Qian, Q., Wang, J. (2024). MS TextSpotter: An Intelligent Instance Segmentation Scheme for Semantic Scene Text Recognition in Asian Social Networks. In: Li, J., Zhang, B., Ying, Y. (eds) 6GN for Future Wireless Networks. 6GN 2023. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 553. Springer, Cham. https://doi.org/10.1007/978-3-031-53401-0_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-53401-0_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-53400-3
Online ISBN: 978-3-031-53401-0
eBook Packages: Computer ScienceComputer Science (R0)