Chinese text recognition enhanced by glyph and character semantic information

Wu, Shilian; Li, Yongrui; Wang, Zengfu

doi:10.1007/s10032-023-00444-9

Chinese text recognition enhanced by glyph and character semantic information

Original Paper
Published: 22 June 2023

Volume 27, pages 45–56, (2024)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Shilian Wu¹,
Yongrui Li¹ &
Zengfu Wang²

435 Accesses
Explore all metrics

Abstract

Chinese text line recognition technology has been applied in a variety of scenarios. As a kind of ideographic writing, Chinese characters contain plenty of semantic information and basic components. While previous methods mainly convert each Chinese character into a discrete label to facilitate the calculation of cross-entropy loss, leaving the fine-grained glyph information (e.g. strokes and radicals) and semantic information unexploited. Concretely, glyph information is crucial for recognizing Chinese characters with similar appearances, as these characters differ only slightly in local strokes. The glyph information reflects these differences guiding the model to learn fine-grained local features. And compared to discrete category labels, the character semantic information introduces diverse visual concepts, which enriches the final character representation. This paper presents a Chinese text recognition method that exploits glyph and character semantic information to acquire effective text representations. Specifically, we propose a Glyph-Aware Decoder to identify characters by dynamically fusing the global visual features with the local stroke and radical features. And we introduce a Contrastive Visual–Textual Learning module to enhance the visual features of Chinese characters by their semantic information. Experiments show that our proposed model achieves state-of-the-art results on the Chinese text recognition benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition

Article 04 September 2023

Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding

Article 15 September 2021

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

References

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, (2016)
Zhao, Z.-Q., Zheng, P., Shou-tao, X., Xindong, W.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)
Article PubMed Google Scholar
Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 33, 17721–17732 (2020)
Google Scholar
Stahlberg, F.: Neural machine translation: a review. J. Artif. Intell. Res. 69, 343–418 (2020)
Article MathSciNet Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)
Article PubMed Google Scholar
Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176, (2016)
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)
Article PubMed Google Scholar
Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In International Conference on Computer Vision, (2011)
Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376, (2006)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Yoshua, B. and Yann, L. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, (2015)
Wang, W., Zhang, J., Du, J., Wang, Z.-R., Zhu, Y.: Denseran for offline handwritten Chinese character recognition. In: International Conference on Frontiers in Handwriting Recognition, (2018)
Chen, J., Li, B., Xue, X.: Zero-shot Chinese character recognition with stroke-level decomposition. In: International Joint Conference on Artificial Intelligence, (2021)
Köhler, W.: Gestalt psychology. Psychol. Forsch. 31(1), 18–30 (1967)
Article PubMed Google Scholar
Liu, P.D., Chung, K.K.H., McBride-Chang, C., Tong, X.: Holistic versus analytic processing: evidence for a different approach to processing of Chinese at the word and character levels in Chinese children. J. Exp. Child Psychol. 107(4), 466–478 (2010)
Article PubMed Google Scholar
Chen, H.-C., Song, H., Lau, W. Y., Wong, K.F.E., Tang, S.L.: Developmental characteristics of eye movements in reading Chinese. Reading development in Chinese children, pp. 157–169, (2003)
Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Asian Conference on Computer Vision, (2014)
He, P., Huang, W., Qiao, Y., Loy, C. C., Tang, X.: Reading scene text in deep convolutional sequences. In: National Conference on Artificial Intelligence, (2016)
Diaz, D.H., Qin, S., Ingl,e R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. Computer Vision and Pattern Recognition (2021)
Wu, G., Zhang, Z., Xiong, Y.: Carvenet: a channel-wise attention-based network for irregular scene text recognition. International Journal on Document Analysis and Recognition (IJDAR), pp. 1–10, (2022)
Cui, S.D., YiLa, S., Ji, Y.T., et al.: An end-to-end network for irregular printed Mongolian recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 25(1), 41–50 (2022)
Article Google Scholar
Tagougui, N., Kherallah, M., Alimi, A.M.: Online Arabic handwriting recognition: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 209–226 (2013)
Article Google Scholar
Liu, X., Meng, G., Pan, C.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 22(2), 143–162 (2019)
Article Google Scholar
Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recognit. 90, 109–118 (2019)
Article ADS Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Neural Information Processing Systems, (2017)
Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2d self-attention. In: Computer Vision and Pattern Recognition, (2020)
Ning, L., Wenwen, Yu., Qi, X., Chen, Y., Gong, P., Xiao, R., Bai, X.: Master: multi-aspect non-local network for scene text recognition. Pattern Recognit. 117, 107980 (2021)
Article Google Scholar
Feng, Z., Du, C., Wang, Y., Xiao, B.: Oster: an orientation sensitive scene text recognizer with centerline rectification. In: Asian Conference on Pattern Recognition, (2019)
Tong, G., Li, Y., Gao, H., Chen, H., Wang, H., Yang, X.: Ma-crnn: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. Int. J. Doc. Anal. Recognit. 23, 103–114 (2020)
Article Google Scholar
Chen, J., Yu, H., Ma, J., Guan, M., Xu, X., Wang, X., Qu, S., Li, B., Xue, X.: Benchmarking Chinese text recognition: datasets, baselines, and an empirical study. arXiv preprint arXiv:2112.15093, (2021)
Yang, M., Guan, Y., Liao, M., He, X., Bian, K., Bai, S., Yao, C., Bai, X.: Symmetry-constrained rectification network for scene text recognition. In: International Conference on Computer Vision, (2019)
Zhan, F., Lu, S.: Esir: end-to-end scene text recognition via iterative image rectification. In: Computer Vision and Pattern Recognition, (2019)
Yu, D., Li, X., Zhang, C., Tao, L., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Computer Vision and Pattern Recognition, (2020)
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. Computer Vision and Pattern Recognition (2021)
Wang, Y., Xie, H., Fang, S., Wang, J., Zhu, S., Zhang, Y.: From two to one: a new scene text recognizer with visual language modeling network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14194–14203, (2021)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, (2020)
Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Tong, L., Luo, P., Shao, L.: Pvtv 2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 1–10 (2022)
Google Scholar
Alec, R., Jong Wook, K., Chris, H., Aditya, R., Gabriel, G., Sandhini, A., Girish, S., Amanda, A., Pamela, M., Jack, C. et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR, (2021)
Yuan, T.-L., Zhu, Z., Xu, K., Li, C.-J., Mu, T.-J., Hu, S.-M.: A large chinese text dataset in the wild. J. Comput. Sci. Technol., (2019)
Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E. et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, (2019)
Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D. et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, (2019)
Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M. et al.: Icdar 2019 robust reading challenge on reading Chinese text on signboard. In: ICDAR, (2019)
Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: ICDAR, (2017)
He, M., Liu, Y., Yang, Z., Zhang, S., Luo, C., Gao, F., Zheng, Q., Wang, Y., Zhang, X., and Jin, L.: Icpr2018 contest on robust reading for multi-type web images. In: ICPR, 2018
Yim, M., Kim, Y., Cho, H.-C., Park, S.: Synthtiger: synthetic text image generator towards better text recognition models. In: International Conference on Document Analysis and Recognition, pp. 109–124. Springer, (2021)
Zhang, H., Liang, L., Jin, L.: Scut-hccdoc: a new benchmark dataset of handwritten Chinese text in unconstrained camera-captured documents. Pattern Recognition, pp. 107559, (2020)
Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: AAAI, (2019)
Liu, Z., Lin,Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021
Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: revisiting the design of spatial attention in vision transformers. Adv. Neural Inf. Process. Syst. 34, 9355–9366 (2021)
Google Scholar
Kuang,Z., Sun, H., Li, Z., Yue, X., Lin, T.H., Chen, J., Wei, H., Zhu, Y., Gao, T., Zhang, W., et al.: Mmocr: a comprehensive toolbox for text detection, recognition and understanding. arXiv preprint arXiv:2108.06543, 2021
van der Maaten, L., Hinton, G.E.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: CVPR, (2020)
Yu, Z.Q., Zhou, D.Y., Zhou, Y., and Wang, W.: Semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, Seed (2020)
Chen, J., Li, B., and Xue, X.: Text-focused scene image super-resolution. In: CVPR, Scene text telescope (2021)
Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: Robustscanner: dynamically enhancing positional clues for robust text recognition. In: European Conference on Computer Vision, pp. 135–151. Springer, (2020)
Lyu, P., Zhang, C., Liu, S., Qiao, M., Xu, Y., Wu, L., Yao, K., Han, J., Ding, E., Wang, J.: Maskocr: text recognition with masked encoder-decoder pretraining. arXiv preprint arXiv:2206.00311, (2022)

Download references

Acknowledgements

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDC08020400).

Author information

Authors and Affiliations

School of Information Science and Technology, University of Science and Technology of China, Hefei, 230000, Anhui, China
Shilian Wu & Yongrui Li
Institute of Intelligent Machines, University of Science and Technology of China, Hefei, 230000, Anhui, China
Zengfu Wang

Authors

Shilian Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yongrui Li
View author publications
You can also search for this author in PubMed Google Scholar
Zengfu Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zengfu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, S., Li, Y. & Wang, Z. Chinese text recognition enhanced by glyph and character semantic information. IJDAR 27, 45–56 (2024). https://doi.org/10.1007/s10032-023-00444-9

Download citation

Received: 13 July 2022
Revised: 13 October 2022
Accepted: 01 June 2023
Published: 22 June 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s10032-023-00444-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Chinese text recognition enhanced by glyph and character semantic information

Abstract

Access this article

Similar content being viewed by others

CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition

Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Chinese text recognition enhanced by glyph and character semantic information

Abstract

Access this article

Similar content being viewed by others

CDistNet: Perceiving Multi-domain Character Distance for Robust Text Recognition

Improving offline handwritten Chinese text recognition with glyph-semanteme fusion embedding

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation