Skip to main content
Log in

Chinese text recognition enhanced by glyph and character semantic information

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

Chinese text line recognition technology has been applied in a variety of scenarios. As a kind of ideographic writing, Chinese characters contain plenty of semantic information and basic components. While previous methods mainly convert each Chinese character into a discrete label to facilitate the calculation of cross-entropy loss, leaving the fine-grained glyph information (e.g. strokes and radicals) and semantic information unexploited. Concretely, glyph information is crucial for recognizing Chinese characters with similar appearances, as these characters differ only slightly in local strokes. The glyph information reflects these differences guiding the model to learn fine-grained local features. And compared to discrete category labels, the character semantic information introduces diverse visual concepts, which enriches the final character representation. This paper presents a Chinese text recognition method that exploits glyph and character semantic information to acquire effective text representations. Specifically, we propose a Glyph-Aware Decoder to identify characters by dynamically fusing the global visual features with the local stroke and radical features. And we introduce a Contrastive Visual–Textual Learning module to enhance the visual features of Chinese characters by their semantic information. Experiments show that our proposed model achieves state-of-the-art results on the Chinese text recognition benchmarks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, (2016)

  2. Zhao, Z.-Q., Zheng, P., Shou-tao, X., Xindong, W.: Object detection with deep learning: a review. IEEE Trans. Neural Netw. Learn. Syst. 30(11), 3212–3232 (2019)

    Article  PubMed  Google Scholar 

  3. Wang, X., Zhang, R., Kong, T., Li, L., Shen, C.: Solov2: dynamic and fast instance segmentation. Adv. Neural Inf. Process. Syst. 33, 17721–17732 (2020)

    Google Scholar 

  4. Stahlberg, F.: Neural machine translation: a review. J. Artif. Intell. Res. 69, 343–418 (2020)

    Article  MathSciNet  Google Scholar 

  5. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2016)

    Article  PubMed  Google Scholar 

  6. Shi, B., Wang, X., Lyu, P., Yao, C., Bai, X.: Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4168–4176, (2016)

  7. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: Aster: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  PubMed  Google Scholar 

  8. Wang, K., Babenko, B., Belongie, S.: End-to-end scene text recognition. In International Conference on Computer Vision, (2011)

  9. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 369–376, (2006)

  10. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: Yoshua, B. and Yann, L. (eds.), 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings, (2015)

  11. Wang, W., Zhang, J., Du, J., Wang, Z.-R., Zhu, Y.: Denseran for offline handwritten Chinese character recognition. In: International Conference on Frontiers in Handwriting Recognition, (2018)

  12. Chen, J., Li, B., Xue, X.: Zero-shot Chinese character recognition with stroke-level decomposition. In: International Joint Conference on Artificial Intelligence, (2021)

  13. Köhler, W.: Gestalt psychology. Psychol. Forsch. 31(1), 18–30 (1967)

    Article  PubMed  Google Scholar 

  14. Liu, P.D., Chung, K.K.H., McBride-Chang, C., Tong, X.: Holistic versus analytic processing: evidence for a different approach to processing of Chinese at the word and character levels in Chinese children. J. Exp. Child Psychol. 107(4), 466–478 (2010)

    Article  PubMed  Google Scholar 

  15. Chen, H.-C., Song, H., Lau, W. Y., Wong, K.F.E., Tang, S.L.: Developmental characteristics of eye movements in reading Chinese. Reading development in Chinese children, pp. 157–169, (2003)

  16. Su, B., Lu, S.: Accurate scene text recognition based on recurrent neural network. In: Asian Conference on Computer Vision, (2014)

  17. He, P., Huang, W., Qiao, Y., Loy, C. C., Tang, X.: Reading scene text in deep convolutional sequences. In: National Conference on Artificial Intelligence, (2016)

  18. Diaz, D.H., Qin, S., Ingl,e R.R., Fujii, Y., Bissacco, A.: Rethinking text line recognition models. Computer Vision and Pattern Recognition (2021)

  19. Wu, G., Zhang, Z., Xiong, Y.: Carvenet: a channel-wise attention-based network for irregular scene text recognition. International Journal on Document Analysis and Recognition (IJDAR), pp. 1–10, (2022)

  20. Cui, S.D., YiLa, S., Ji, Y.T., et al.: An end-to-end network for irregular printed Mongolian recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 25(1), 41–50 (2022)

    Article  Google Scholar 

  21. Tagougui, N., Kherallah, M., Alimi, A.M.: Online Arabic handwriting recognition: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 16(3), 209–226 (2013)

    Article  Google Scholar 

  22. Liu, X., Meng, G., Pan, C.: Scene text detection and recognition with advances in deep learning: a survey. Int. J. Doc. Anal. Recognit. (IJDAR) 22(2), 143–162 (2019)

    Article  Google Scholar 

  23. Luo, C., Jin, L., Sun, Z.: Moran: a multi-object rectified attention network for scene text recognition. Pattern Recognit. 90, 109–118 (2019)

    Article  ADS  Google Scholar 

  24. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Neural Information Processing Systems, (2017)

  25. Lee, J., Park, S., Baek, J., Oh, S.J., Kim, S., Lee, H.: On recognizing texts of arbitrary shapes with 2d self-attention. In: Computer Vision and Pattern Recognition, (2020)

  26. Ning, L., Wenwen, Yu., Qi, X., Chen, Y., Gong, P., Xiao, R., Bai, X.: Master: multi-aspect non-local network for scene text recognition. Pattern Recognit. 117, 107980 (2021)

    Article  Google Scholar 

  27. Feng, Z., Du, C., Wang, Y., Xiao, B.: Oster: an orientation sensitive scene text recognizer with centerline rectification. In: Asian Conference on Pattern Recognition, (2019)

  28. Tong, G., Li, Y., Gao, H., Chen, H., Wang, H., Yang, X.: Ma-crnn: a multi-scale attention CRNN for Chinese text line recognition in natural scenes. Int. J. Doc. Anal. Recognit. 23, 103–114 (2020)

    Article  Google Scholar 

  29. Chen, J., Yu, H., Ma, J., Guan, M., Xu, X., Wang, X., Qu, S., Li, B., Xue, X.: Benchmarking Chinese text recognition: datasets, baselines, and an empirical study. arXiv preprint arXiv:2112.15093, (2021)

  30. Yang, M., Guan, Y., Liao, M., He, X., Bian, K., Bai, S., Yao, C., Bai, X.: Symmetry-constrained rectification network for scene text recognition. In: International Conference on Computer Vision, (2019)

  31. Zhan, F., Lu, S.: Esir: end-to-end scene text recognition via iterative image rectification. In: Computer Vision and Pattern Recognition, (2019)

  32. Yu, D., Li, X., Zhang, C., Tao, L., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: Computer Vision and Pattern Recognition, (2020)

  33. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. Computer Vision and Pattern Recognition (2021)

  34. Wang, Y., Xie, H., Fang, S., Wang, J., Zhu, S., Zhang, Y.: From two to one: a new scene text recognizer with visual language modeling network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 14194–14203, (2021)

  35. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, (2020)

  36. Wang, W., Xie, E., Li, X., Fan, D.-P., Song, K., Liang, D., Tong, L., Luo, P., Shao, L.: Pvtv 2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8(3), 1–10 (2022)

    Google Scholar 

  37. Alec, R., Jong Wook, K., Chris, H., Aditya, R., Gabriel, G., Sandhini, A., Girish, S., Amanda, A., Pamela, M., Jack, C. et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR, (2021)

  38. Yuan, T.-L., Zhu, Z., Xu, K., Li, C.-J., Mu, T.-J., Hu, S.-M.: A large chinese text dataset in the wild. J. Comput. Sci. Technol., (2019)

  39. Chng, C.K., Liu, Y., Sun, Y., Ng, C.C., Luo, C., Ni, Z., Fang, C., Zhang, S., Han, J., Ding, E. et al.: Icdar2019 robust reading challenge on arbitrary-shaped text-rrc-art. In: ICDAR, (2019)

  40. Sun, Y., Ni, Z., Chng, C.-K., Liu, Y., Luo, C., Ng, C.C., Han, J., Ding, E., Liu, J., Karatzas, D. et al.: Icdar 2019 competition on large-scale street view text with partial labeling-rrc-lsvt. In: ICDAR, (2019)

  41. Zhang, R., Zhou, Y., Jiang, Q., Song, Q., Li, N., Zhou, K., Wang, L., Wang, D., Liao, M., Yang, M. et al.: Icdar 2019 robust reading challenge on reading Chinese text on signboard. In: ICDAR, (2019)

  42. Shi, B., Yao, C., Liao, M., Yang, M., Xu, P., Cui, L., Belongie, S., Lu, S., Bai, X.: Icdar2017 competition on reading Chinese text in the wild (rctw-17). In: ICDAR, (2017)

  43. He, M., Liu, Y., Yang, Z., Zhang, S., Luo, C., Gao, F., Zheng, Q., Wang, Y., Zhang, X., and Jin, L.: Icpr2018 contest on robust reading for multi-type web images. In: ICPR, 2018

  44. Yim, M., Kim, Y., Cho, H.-C., Park, S.: Synthtiger: synthetic text image generator towards better text recognition models. In: International Conference on Document Analysis and Recognition, pp. 109–124. Springer, (2021)

  45. Zhang, H., Liang, L., Jin, L.: Scut-hccdoc: a new benchmark dataset of handwritten Chinese text in unconstrained camera-captured documents. Pattern Recognition, pp. 107559, (2020)

  46. Li, H., Wang, P., Shen, C., Zhang, G.: Show, attend and read: a simple and strong baseline for irregular text recognition. In: AAAI, (2019)

  47. Liu, Z., Lin,Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., and Guo, B.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022, 2021

  48. Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: revisiting the design of spatial attention in vision transformers. Adv. Neural Inf. Process. Syst. 34, 9355–9366 (2021)

    Google Scholar 

  49. Kuang,Z., Sun, H., Li, Z., Yue, X., Lin, T.H., Chen, J., Wei, H., Zhu, Y., Gao, T., Zhang, W., et al.: Mmocr: a comprehensive toolbox for text detection, recognition and understanding. arXiv preprint arXiv:2108.06543, 2021

  50. van der Maaten, L., Hinton, G.E.: Visualizing data using T-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)

    Google Scholar 

  51. Yu, D., Li, X., Zhang, C., Liu, T., Han, J., Liu, J., Ding, E.: Towards accurate scene text recognition with semantic reasoning networks. In: CVPR, (2020)

  52. Yu, Z.Q., Zhou, D.Y., Zhou, Y., and Wang, W.: Semantics enhanced encoder-decoder framework for scene text recognition. In: CVPR, Seed (2020)

  53. Chen, J., Li, B., and Xue, X.: Text-focused scene image super-resolution. In: CVPR, Scene text telescope (2021)

  54. Yue, X., Kuang, Z., Lin, C., Sun, H., Zhang, W.: Robustscanner: dynamically enhancing positional clues for robust text recognition. In: European Conference on Computer Vision, pp. 135–151. Springer, (2020)

  55. Lyu, P., Zhang, C., Liu, S., Qiao, M., Xu, Y., Wu, L., Yao, K., Han, J., Ding, E., Wang, J.: Maskocr: text recognition with masked encoder-decoder pretraining. arXiv preprint arXiv:2206.00311, (2022)

Download references

Acknowledgements

This work was supported by the Strategic Priority Research Program of the Chinese Academy of Sciences (XDC08020400).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengfu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, S., Li, Y. & Wang, Z. Chinese text recognition enhanced by glyph and character semantic information. IJDAR 27, 45–56 (2024). https://doi.org/10.1007/s10032-023-00444-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-023-00444-9

Keywords

Navigation