Skip to main content

Zero-Shot Chinese Text Recognition via Matching Class Embedding

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12823))

Abstract

This paper studies the challenging problem of zero-shot Chinese text recognition, which requires the model to train on text line images containing only the seen characters, and then recognize the unseen characters from new text line images. Most of the previous methods only consider the zero-shot Chinese character recognition problem. They attempt to decompose the Chinese characters into radical representations and then recognize them at the radical level. Some methods developed recently have extended the radical-based recognition model from recognizing characters to recognizing text lines. However, the disadvantages of these methods include the requirement of long training time and a complicated decoding process. In addition, these methods are unsuitable for long text sequences. In this paper, we have proposed a novel zero-shot Chinese text recognition network (ZCTRN) by matching the class embeddings with the visual features. Specifically, our proposed model consists of three components: a text line encoder that extracts the visual features from the text line images, a class embedding module that encodes the character classes into class embeddings, and a bidirectional embedding transfer module that can map the class embeddings into the visual space and preserve the information of the original class embeddings. In addition, we use a distance-based CTC decoder to match the visual features with the class embeddings and output the recognition results. Experimental obtained by applying our proposed network to the MTHv2 dataset and the ICDAR-2013 handwriting competition dataset show that our method not only preserves high accuracy in recognizing text line images containing seen characters, but also outperforms the existing state-of-the-art models in recognizing text line images containing unseen characters.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/Canjie-Luo/Text-Image-Augmentation.

References

  1. Akata, Z., Perronnin, F., Harchaoui, Z., Schmid, C.: Label-embedding for attribute-based classification. In: CVPR, pp. 819–826 (2013)

    Google Scholar 

  2. Ao, X., Zhang, X., Yang, H., Yin, F., Liu, C.: Cross-modal prototype learning for zero-shot handwriting recognition. In: ICDAR, pp. 589–594 (2019)

    Google Scholar 

  3. Baek, J., et al.: What is wrong with scene text recognition model comparisons? Dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)

    Google Scholar 

  4. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  5. Cao, Z., Lu, J., Cui, S., Zhang, C.: Zero-shot handwritten Chinese character recognition with hierarchical decomposition embedding. Pattern Recognit. 107, 107488 (2020)

    Article  Google Scholar 

  6. Du, J., Wang, Z.-R., Zhai, J., Hu, J.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: ICPR, pp. 3428–3433 (2016)

    Google Scholar 

  7. Fu, Y., Hospedales, T.M., Xiang, T., Gong, S.: Transductive multi-view zero-shot learning. IEEE Trans. Pattern Anal. Mach. Intell. 37(11), 2332–2345 (2015)

    Article  Google Scholar 

  8. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)

    Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  10. Kodirov, E., Xiang, T., Gong, S.: Semantic autoencoder for zero-shot learning. In: CVPR, pp. 3174–3183 (2017)

    Google Scholar 

  11. Lampert, C.H., Nickisch, H., Harmeling, S.: Learning to detect unseen object classes by between-class attribute transfer. In: CVPR, pp. 951–958. IEEE (2009)

    Google Scholar 

  12. Li, Z., Wu, Q., Xiao, Y., Jin, M., Lu, H.: Deep matching network for handwritten Chinese character recognition. Pattern Recognit. 107, 107471 (2020)

    Article  Google Scholar 

  13. Liu, C., Yin, F., Wang, D., Wang, Q.: CASIA online and offline Chinese handwriting databases. In: ICDAR, pp. 37–41 (2011)

    Google Scholar 

  14. Ma, W., Zhang, H., Jin, L., Wu, S., Wang, J., Wang, Y.: Joint layout analysis, character detection and recognition for historical document digitization. In: ICFHR, pp. 31–36 (2020)

    Google Scholar 

  15. Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: ICDAR, pp. 171–175 (2015)

    Google Scholar 

  16. Peng, D., Jin, L., Wu, Y., Wang, Z., Cai, M.: A fast and accurate fully convolutional network for end-to-end handwritten Chinese text segmentation and recognition. In: ICDAR, pp. 25–30. IEEE (2019)

    Google Scholar 

  17. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  18. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2018)

    Article  Google Scholar 

  19. Shigeto, Y., Suzuki, I., Hara, K., Shimbo, M., Matsumoto, Y.: Ridge regression, hubness, and zero-shot learning. In: Appice, A., Rodrigues, P.P., Santos Costa, V., Soares, C., Gama, J., Jorge, A. (eds.) ECML PKDD 2015, Part I. LNCS (LNAI), vol. 9284, pp. 135–151. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-23528-8_9

    Chapter  Google Scholar 

  20. Wan, Z., Xie, F., Liu, Y., Bai, X., Yao, C.: 2D-CTC for scene text recognition. arXiv preprint arXiv:1907.09705 (2019)

  21. Wang, Q., Yin, F., Liu, C.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2012)

    Article  Google Scholar 

  22. Wang, S., Chen, L., Xu, L., Fan, W., Sun, J., Naoi, S.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: ICFHR, pp. 84–89. IEEE (2016)

    Google Scholar 

  23. Wang, T., Xie, Z., Li, Z., Jin, L., Chen, X.: Radical aggregation network for few-shot offline handwritten Chinese character recognition. Pattern Recognit. Lett. 125, 821–827 (2019)

    Article  Google Scholar 

  24. Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, pp. 12216–12224 (2020)

    Google Scholar 

  25. Wang, W., Zheng, V.W., Yu, H., Miao, C.: A survey of zero-shot learning: settings, methods, and applications. ACM Trans. Intell. Syst. Technol. 10(2), 1–37 (2019)

    Google Scholar 

  26. Wang, Z.R., Du, J., Wang, J.M.: Writer-aware CNN for parsimonious hmm-based offline handwritten Chinese text recognition. Pattern Recognit. 100, 107102 (2020)

    Article  Google Scholar 

  27. Wang, Z.R., Du, J., Wang, W.C., Zhai, J.F., Hu, J.S.: A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. Int. J. Doc. Anal. Recognit. 21(4), 241–251 (2018)

    Article  Google Scholar 

  28. Wu, Y.C., Yin, F., Chen, Z., Liu, C.L.: Handwritten Chinese text recognition using separable multi-dimensional recurrent neural network. In: ICDAR, vol. 1, pp. 79–84. IEEE (2017)

    Google Scholar 

  29. Xie, C., Lai, S., Liao, Q., Jin, L.: High performance offline handwritten Chinese text recognition with a new data preprocessing and augmentation pipeline. In: Bai, X., Karatzas, D., Lopresti, D. (eds.) DAS 2020. LNCS, vol. 12116, pp. 45–59. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-57058-3_4

    Chapter  Google Scholar 

  30. Xie, Z., Huang, Y., Zhu, Y., Jin, L., Liu, Y., Xie, L.: Aggregation cross-entropy for sequence recognition. In: CVPR, pp. 6531–6540 (2019)

    Google Scholar 

  31. Xiu, Y., Wang, Q., Zhan, H., Lan, M., Lu, Y.: A handwritten Chinese text recognizer applying multi-level multimodal fusion network. In: ICDAR, pp. 1464–1469 (2019)

    Google Scholar 

  32. Yin, F., Wang, Q.F., Zhang, X.Y., Liu, C.L.: ICDAR 2013 Chinese handwriting recognition competition. In: ICDAR, pp. 1464–1470. IEEE (2013)

    Google Scholar 

  33. Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  34. Zhang, J., Du, J., Dai, L.: Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103, 107305 (2020)

    Article  Google Scholar 

  35. Zhang, J., Zhu, Y., Du, J., Dai, L.: Radical analysis network for zero-shot learning in printed Chinese character recognition. In: ICME, pp. 1–6. IEEE (2018)

    Google Scholar 

  36. Zhang, L., Xiang, T., Gong, S.: Learning a deep embedding model for zero-shot learning. In: CVPR, pp. 2021–2030 (2017)

    Google Scholar 

Download references

Acknowledgement

This research is supported in part by NSFC (Grant No.: 61936003), the National Key Research and Development Program of China (No. 2016YFB1001405), GD-NSF (no. 2017A030312006), Guangdong Intellectual Property Office Project (2018-10-1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianwen Jin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Huang, Y., Jin, L., Peng, D. (2021). Zero-Shot Chinese Text Recognition via Matching Class Embedding. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12823. Springer, Cham. https://doi.org/10.1007/978-3-030-86334-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86334-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86333-3

  • Online ISBN: 978-3-030-86334-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics