Skip to main content

Open-Set Text Recognition via Shape-Awareness Visual Reconstruction

  • Conference paper
  • First Online:
Document Analysis and Recognition - ICDAR 2023 (ICDAR 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14192))

Included in the following conference series:

Abstract

Open-Set Text Recognition (OSTR) is an emerging task that models the constantly evolving char-set in open-world character recognition applications. Compared to conventional counterparts, the OSTR task demands actively spotting and incrementally recognizing novel characters. Existing methods have demonstrated some success, yet confusion among similar characters remains to be a major challenge, potentially due to insufficient shape information preserved in the character features. In this work, we propose to alleviate this problem via visual reconstruction. Specifically, a glyph reconstruction task is adopted to implement shape awareness. Furthermore, cut-and-mixed characters are introduced to alleviate overfitting, by improving the coverage of the glyph space. Finally, a cycle classification task is proposed to prioritize the preservation of classification-critic regions by sending reconstructed images to a classifier network. Extensive experiments show that both tasks yield satisfying improvements on the OSTR task, and the full model demonstrates decent performance in recognizing both seen and novel characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/lancercat/OpenSAVR.

  2. 2.

    Note a label may have more than one cases, and each case yields a dedicated prototype, thus \(c\le n\).

  3. 3.

    Also described in Appendix A.2.

References

  1. Ao, X., Zhang, X., Yang, H., Yin, F., Liu, C.: Cross-modal prototype learning for zero-shot handwriting recognition. In: ICDAR, pp. 589–594 (2019)

    Google Scholar 

  2. Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 319–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_21

    Chapter  Google Scholar 

  3. Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)

    Google Scholar 

  4. Bhunia, A.K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P.N., Song, Y.: Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: ICCV, pp. 14920–14929 (2021)

    Google Scholar 

  5. Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: Large scale system for text detection and recognition in images. In: KDD, pp. 71–79 (2018)

    Google Scholar 

  6. Chen, J., Li, B., Xue, X.: Zero-shot Chinese character recognition with stroke-level decomposition. In: IJCAI, pp. 615–621 (2021)

    Google Scholar 

  7. Chen, J., et al.: Benchmarking Chinese text recognition: Datasets, baselines, and an empirical study. arXiv preprint arXiv:2112.15093 (2021)

  8. Chen, X., Jin, L., Zhu, Y., Luo, C., Wang, T.: Text recognition in the wild: a surcvey. CSUR 54(2), 1–35 (2021)

    Article  Google Scholar 

  9. Chng, C.K., et al.: ICDAR2019 robust reading challenge on arbitrary-shaped text - rrc-art. In: ICDAR, pp. 1571–1576 (2019)

    Google Scholar 

  10. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  11. Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: CVPR, pp. 7098–7107 (2021)

    Google Scholar 

  12. Garcia-Bordils, S., et al.: Out-of-vocabulary challenge report. arXiv preprint arXiv:2209.06717 (2022)

  13. Geng, C., Huang, S., Chen, S.: Recent advances in open set recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3614–3631 (2021)

    Article  Google Scholar 

  14. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)

    Google Scholar 

  15. Han, J., Ren, Y., Ding, J., Pan, X., Yan, K., Xia, G.S.: Expanding low-density latent regions for open-set object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9591–9600 (2022)

    Google Scholar 

  16. He, S., Schomaker, L.: Open set Chinese character recognition using multi-typed attributes. arXiv preprint arXiv:1808.08993 (2018)

  17. Huang, H., Wang, Y., Hu, Q., Cheng, M.M.: Class-specific semantic reconstruction for open set recognition. In: IEEE TPAMI (2022)

    Google Scholar 

  18. Huang, Y., Jin, L., Peng, D.: Zero-shot Chinese text recognition via matching class embedding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 127–141. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_9

    Chapter  Google Scholar 

  19. Hwang, J., Oh, S.W., Lee, J.Y., Han, B.: Exemplar-based open-set panoptic segmentation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1184 (2021)

    Google Scholar 

  20. Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs/1406.2227 (2014)

    Google Scholar 

  21. Jin, X., Lan, C., Zeng, W., Chen, Z.: Style normalization and restitution for domain generalization and adaptation. IEEE Trans. Multimedia 24, 3636–3651 (2021)

    Article  Google Scholar 

  22. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR. pp. 1484–1493 (2013)

    Google Scholar 

  23. Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR, pp. 289–294. IEEE (2016)

    Google Scholar 

  24. Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: AAAI, pp. 8714–8721 (2019)

    Google Scholar 

  25. Liu, C., Yang, C., Qin, H.B., Zhu, X., Liu, C.L., Yin, X.C.: Towards open-set text recognition via label-to-prototype learning. Pattern Recogn. 134, 109109 (2023)

    Article  Google Scholar 

  26. Liu, C., Yang, C., Yin, X.C.: Open-set text recognition via character-context decoupling. In: CVPR, pp. 4523–4532, June 2022

    Google Scholar 

  27. Liu, Y., Wang, Z., Jin, H., Wassell, I.: Synthetically supervised feature learning for scene text recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 449–465. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_27

    Chapter  Google Scholar 

  28. Lucas, S.M., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal. Recognit. 7(2–3), 105–122 (2005)

    Article  Google Scholar 

  29. Ma, X., et al.: A comprehensive survey on graph anomaly detection with deep learning. arXiv preprint arXiv:2106.07178 (2021)

  30. Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC, BMVA (2012)

    Google Scholar 

  31. Mou, Y., et al.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_10

    Chapter  Google Scholar 

  32. Radford, A., Metz, L., Chintala, S.: Deep convolutional generative adversarial network. In: Under Review as a Conference Paper at ICLR (2016)

    Google Scholar 

  33. Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)

    Article  Google Scholar 

  34. Shen, Z., Zhang, K., Dell, M.: A large dataset of historical Japanese documents with complex layouts. arXiv preprint arXiv:2004.08686 (2020)

  35. Sheng, F., Chen, Z., Xu, B.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: ICDAR, pp. 781–786 (2019)

    Google Scholar 

  36. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  37. Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)

    Article  Google Scholar 

  38. Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: ICDAR, pp. 1429–1434 (2017)

    Google Scholar 

  39. Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling - RRC-LSVT. In: ICDAR, pp. 1557–1562 (2019)

    Google Scholar 

  40. Wan, Z., He, M., Chen, H., Bai, X., Yao, C.: Textscanner: reading characters in order for robust scene text recognition. In: AAAI, pp. 12120–12127 (2020)

    Google Scholar 

  41. Wan, Z., Zhang, J., Zhang, L., Luo, J., Yao, C.: On vocabulary reliance in scene text recognition. In: CVPR, pp. 11422–11431 (2020)

    Google Scholar 

  42. Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)

    Google Scholar 

  43. Wang, L., Li, D., Zhu, Y., Tian, L., Shan, Y.: Dual super-resolution learning for semantic segmentation. In: CVPR, pp. 3773–3782 (2020)

    Google Scholar 

  44. Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, pp. 12216–12224 (2020)

    Google Scholar 

  45. Wang, Y., Lian, Z.: Exploring font-independent features for scene text recognition. In: MM, pp. 1900–1920 (2020)

    Google Scholar 

  46. Wang, Y., Lian, Z., Tang, Y., Xiao, J.: Boosting scene character recognition by learning canonical forms of glyphs. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 209–219 (2019). https://doi.org/10.1007/s10032-019-00326-z

    Article  Google Scholar 

  47. Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2019)

    Article  Google Scholar 

  48. Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: CVPR, pp. 284–293 (2021)

    Google Scholar 

  49. Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., Naemura, T.: Classification-reconstruction learning for open-set recognition. In: CVPR, pp. 4016–4025 (2019)

    Google Scholar 

  50. Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: CVPR, pp. 12110–12119 (2020)

    Google Scholar 

  51. Yuan, T., Zhu, Z., Xu, K., Li, C., Mu, T., Hu, S.: A large Chinese text dataset in the wild. J. Comput. Sci. Technol. 34(3), 509–521 (2019)

    Article  Google Scholar 

  52. Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 51–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_4

    Chapter  Google Scholar 

  53. Zhang, J., Du, J., Dai, L.: Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103, 107305 (2020)

    Article  Google Scholar 

  54. Zhou, Z.H.: Open-environment machine learning. Nat. Sci. Rev. 9(8), nwac123 (2022)

    Google Scholar 

Download references

Acknowledgement

The research is supported by National Key Research and Development Program of China (2020AAA0109700), National Science Fund for Distinguished Young Scholars (62125601), National Natural Science Foundation of China (62076024, 62006018), Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities)(FRF-IDRY-21-018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xu-Cheng Yin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, C., Yang, C., Yin, XC. (2023). Open-Set Text Recognition via Shape-Awareness Visual Reconstruction. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14192. Springer, Cham. https://doi.org/10.1007/978-3-031-41731-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41731-3_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41730-6

  • Online ISBN: 978-3-031-41731-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics