Skip to main content

ChaCo: Character Contrastive Learning for Handwritten Text Recognition

  • Conference paper
  • First Online:
Frontiers in Handwriting Recognition (ICFHR 2022)

Abstract

Current mainstream text recognition models rely heavily on large-scale data, requiring expensive annotations to achieve high performance. Contrast-based self-supervised learning methods aimed at minimizing distances between positive pairs provide a nice way to alleviate this problem. Previous studies are implemented from the perspective of words, taking the entire word image as model input. But characters are actually the basic elements of words, so in this paper, we implement contrastive learning from another perspective, i.e., the perspective of characters. Specifically, a simple yet effective method, termed ChaCo, is proposed, which takes the characters and strokes (called a character unit) cropped from the word image as model input. However, in the commonly used random cropping approach, the positive pairs may contain completely different characters, in which case it is unreasonable to minimize the distance between positive pairs. To address this issue, we introduce a Character Unit Cropping Module (CUCM) to ensure the positive pairs contain the same characters by constraining the selection region of the positive sample. Experiments show that our proposed method can achieve much better representation quality than previous methods while requiring fewer computation resources. Under the semi-supervised setting, ChaCo can achieve promising performance with an accuracy improvement of 13.1 points on the IAM dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/albumentations-team/albumentations.

  2. 2.

    We contacted the authors of SeqCLR to get the training details.

References

  1. Aberdam, A., et al.: Sequence-to-sequence contrastive learning for text recognition. In: CVPR, pp. 15302–15312 (2021)

    Google Scholar 

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)

    Google Scholar 

  3. Buslaev, A., Iglovikov, V.I., Khvedchenya, E., Parinov, A., Druzhinin, M., Kalinin, A.A.: Albumentations: fast and flexible image augmentations. Information 11(2), 125 (2020)

    Article  Google Scholar 

  4. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607 (2020)

    Google Scholar 

  5. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning (2020). arXiv preprint arXiv:2003.04297

  6. Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)

    Google Scholar 

  7. Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: ICML, pp. 369–376 (2006)

    Google Scholar 

  8. Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. In: NeurIPS, vol. 33, pp. 21271–21284 (2020)

    Google Scholar 

  9. Grosicki, E., Abed, H.E.: ICDAR 2009 handwriting recognition competition. In: ICDAR, pp. 1398–1402. IEEE Computer Society (2009)

    Google Scholar 

  10. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9726–9735 (2020)

    Google Scholar 

  11. Kleber, F., Fiel, S., Diem, M., Sablatnig, R.: CVL-DataBase: an off-line database for writer retrieval, writer identification and word spotting. In: ICDAR, pp. 560–564 (2013)

    Google Scholar 

  12. Liu, H., et al.: Perceiving stroke-semantic context: hierarchical contrastive learning for robust scene text recognition. In: AAAI (2022)

    Google Scholar 

  13. Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Trans. Knowl. Data Eng. 1 (2021)

    Google Scholar 

  14. Luo, C., Jin, L., Chen, J.: SimAN: exploring self-supervised representation learning of scene text via similarity-aware normalization. In: CVPR (2022)

    Google Scholar 

  15. Marti, U.V., Bunke, H.: The IAM-database: an English sentence database for offline handwriting recognition. Int. J. Doc. Anal. Recogn. 5(1), 39–46 (2002). https://doi.org/10.1007/s100320200071

    Article  MATH  Google Scholar 

  16. Nguyen, N., et al.: Dictionary-guided scene text recognition. In: CVPR, pp. 7383–7392 (2021)

    Google Scholar 

  17. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). arXiv preprint arXiv:1807.03748

  18. Tendle, A., Hasan, M.R.: A study of the generalizability of self-supervised representations. Mach. Learn. Appl. 6, 100124 (2021)

    Google Scholar 

  19. Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, vol. 34, pp. 12216–12224 (2020)

    Google Scholar 

  20. Wang, T., et al.: Implicit feature alignment: learn to convert text recognizer to text spotter. In: CVPR, pp. 5973–5982 (2021)

    Google Scholar 

  21. Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: CVPR, pp. 3024–3033 (2021)

    Google Scholar 

  22. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: CVPR, pp. 3733–3742 (2018)

    Google Scholar 

  23. Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: CVPR, pp. 284–293 (2021)

    Google Scholar 

  24. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: ICML, pp. 12310–12320 (2021)

    Google Scholar 

Download references

Acknowledgement

This research is supported in part by NSFC (Grant No. 61936003), GD-NSF (no. 2017A030312006, No. 2021A1515011870), Zhuhai Industry Core and Key Technology Research Project (no. ZH22044702200058PJL), and the Science and Technology Foundation of Guangzhou Huangpu Development District (Grant 2020GH17).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lianwen Jin .

Editor information

Editors and Affiliations

A Appendix

A Appendix

In this appendix, the pseudo-code of data augmentation in Sect. 3.2 is shown below for reference.

figure b

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, X., Wang, T., Wang, J., Jin, L., Luo, C., Xue, Y. (2022). ChaCo: Character Contrastive Learning for Handwritten Text Recognition. In: Porwal, U., Fornés, A., Shafait, F. (eds) Frontiers in Handwriting Recognition. ICFHR 2022. Lecture Notes in Computer Science, vol 13639. Springer, Cham. https://doi.org/10.1007/978-3-031-21648-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-21648-0_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-21647-3

  • Online ISBN: 978-3-031-21648-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics