Open-Set Text Recognition via Shape-Awareness Visual Reconstruction

Liu, Chang; Yang, Chun; Yin, Xu-Cheng

doi:10.1007/978-3-031-41731-3_6

Chang Liu¹¹,
Chun Yang¹¹ &
Xu-Cheng Yin^11,12

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14192))

Included in the following conference series:

International Conference on Document Analysis and Recognition

429 Accesses
3 Citations

Abstract

Open-Set Text Recognition (OSTR) is an emerging task that models the constantly evolving char-set in open-world character recognition applications. Compared to conventional counterparts, the OSTR task demands actively spotting and incrementally recognizing novel characters. Existing methods have demonstrated some success, yet confusion among similar characters remains to be a major challenge, potentially due to insufficient shape information preserved in the character features. In this work, we propose to alleviate this problem via visual reconstruction. Specifically, a glyph reconstruction task is adopted to implement shape awareness. Furthermore, cut-and-mixed characters are introduced to alleviate overfitting, by improving the coverage of the glyph space. Finally, a cycle classification task is proposed to prioritize the preservation of classification-critic regions by sending reconstructed images to a classifier network. Extensive experiments show that both tasks yield satisfying improvements on the OSTR task, and the full model demonstrates decent performance in recognizing both seen and novel characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/lancercat/OpenSAVR.
2.
Note a label may have more than one cases, and each case yields a dedicated prototype, thus \(c\le n\).
3.
Also described in Appendix A.2.

References

Ao, X., Zhang, X., Yang, H., Yin, F., Liu, C.: Cross-modal prototype learning for zero-shot handwriting recognition. In: ICDAR, pp. 589–594 (2019)
Google Scholar
Atienza, R.: Vision transformer for fast and efficient scene text recognition. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12821, pp. 319–334. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86549-8_21
Chapter Google Scholar
Baek, J., et al.: What is wrong with scene text recognition model comparisons? dataset and model analysis. In: ICCV, pp. 4714–4722 (2019)
Google Scholar
Bhunia, A.K., Sain, A., Kumar, A., Ghose, S., Chowdhury, P.N., Song, Y.: Joint visual semantic reasoning: Multi-stage decoder for text recognition. In: ICCV, pp. 14920–14929 (2021)
Google Scholar
Borisyuk, F., Gordo, A., Sivakumar, V.: Rosetta: Large scale system for text detection and recognition in images. In: KDD, pp. 71–79 (2018)
Google Scholar
Chen, J., Li, B., Xue, X.: Zero-shot Chinese character recognition with stroke-level decomposition. In: IJCAI, pp. 615–621 (2021)
Google Scholar
Chen, J., et al.: Benchmarking Chinese text recognition: Datasets, baselines, and an empirical study. arXiv preprint arXiv:2112.15093 (2021)
Chen, X., Jin, L., Zhu, Y., Luo, C., Wang, T.: Text recognition in the wild: a surcvey. CSUR 54(2), 1–35 (2021)
Article Google Scholar
Chng, C.K., et al.: ICDAR2019 robust reading challenge on arbitrary-shaped text - rrc-art. In: ICDAR, pp. 1571–1576 (2019)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Fang, S., Xie, H., Wang, Y., Mao, Z., Zhang, Y.: Read like humans: autonomous, bidirectional and iterative language modeling for scene text recognition. In: CVPR, pp. 7098–7107 (2021)
Google Scholar
Garcia-Bordils, S., et al.: Out-of-vocabulary challenge report. arXiv preprint arXiv:2209.06717 (2022)
Geng, C., Huang, S., Chen, S.: Recent advances in open set recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(10), 3614–3631 (2021)
Article Google Scholar
Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: CVPR, pp. 2315–2324 (2016)
Google Scholar
Han, J., Ren, Y., Ding, J., Pan, X., Yan, K., Xia, G.S.: Expanding low-density latent regions for open-set object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9591–9600 (2022)
Google Scholar
He, S., Schomaker, L.: Open set Chinese character recognition using multi-typed attributes. arXiv preprint arXiv:1808.08993 (2018)
Huang, H., Wang, Y., Hu, Q., Cheng, M.M.: Class-specific semantic reconstruction for open set recognition. In: IEEE TPAMI (2022)
Google Scholar
Huang, Y., Jin, L., Peng, D.: Zero-shot Chinese text recognition via matching class embedding. In: Lladós, J., Lopresti, D., Uchida, S. (eds.) ICDAR 2021. LNCS, vol. 12823, pp. 127–141. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-86334-0_9
Chapter Google Scholar
Hwang, J., Oh, S.W., Lee, J.Y., Han, B.: Exemplar-based open-set panoptic segmentation network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1175–1184 (2021)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. CoRR abs/1406.2227 (2014)
Google Scholar
Jin, X., Lan, C., Zeng, W., Chen, Z.: Style normalization and restitution for domain generalization and adaptation. IEEE Trans. Multimedia 24, 3636–3651 (2021)
Article Google Scholar
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazán, J., de las Heras, L.: ICDAR 2013 robust reading competition. In: ICDAR. pp. 1484–1493 (2013)
Google Scholar
Krishnan, P., Dutta, K., Jawahar, C.: Deep feature embedding for accurate recognition and retrieval of handwritten text. In: ICFHR, pp. 289–294. IEEE (2016)
Google Scholar
Liao, M., et al.: Scene text recognition from two-dimensional perspective. In: AAAI, pp. 8714–8721 (2019)
Google Scholar
Liu, C., Yang, C., Qin, H.B., Zhu, X., Liu, C.L., Yin, X.C.: Towards open-set text recognition via label-to-prototype learning. Pattern Recogn. 134, 109109 (2023)
Article Google Scholar
Liu, C., Yang, C., Yin, X.C.: Open-set text recognition via character-context decoupling. In: CVPR, pp. 4523–4532, June 2022
Google Scholar
Liu, Y., Wang, Z., Jin, H., Wassell, I.: Synthetically supervised feature learning for scene text recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 449–465. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_27
Chapter Google Scholar
Lucas, S.M., et al.: ICDAR 2003 robust reading competitions: entries, results, and future directions. Int. J. Doc. Anal. Recognit. 7(2–3), 105–122 (2005)
Article Google Scholar
Ma, X., et al.: A comprehensive survey on graph anomaly detection with deep learning. arXiv preprint arXiv:2106.07178 (2021)
Mishra, A., Alahari, K., Jawahar, C.: Scene text recognition using higher order language priors. In: BMVC, BMVA (2012)
Google Scholar
Mou, Y., et al.: PlugNet: degradation aware scene text recognition supervised by a pluggable super-resolution unit. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12360, pp. 158–174. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58555-6_10
Chapter Google Scholar
Radford, A., Metz, L., Chintala, S.: Deep convolutional generative adversarial network. In: Under Review as a Conference Paper at ICLR (2016)
Google Scholar
Risnumawan, A., Shivakumara, P., Chan, C.S., Tan, C.L.: A robust arbitrary text detection system for natural scene images. Expert Syst. Appl. 41(18), 8027–8048 (2014)
Article Google Scholar
Shen, Z., Zhang, K., Dell, M.: A large dataset of historical Japanese documents with complex layouts. arXiv preprint arXiv:2004.08686 (2020)
Sheng, F., Chen, Z., Xu, B.: NRTR: a no-recurrence sequence-to-sequence model for scene text recognition. In: ICDAR, pp. 781–786 (2019)
Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Shi, B., Yang, M., Wang, X., Lyu, P., Yao, C., Bai, X.: ASTER: an attentional scene text recognizer with flexible rectification. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2035–2048 (2019)
Article Google Scholar
Shi, B., et al.: ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In: ICDAR, pp. 1429–1434 (2017)
Google Scholar
Sun, Y., et al.: ICDAR 2019 competition on large-scale street view text with partial labeling - RRC-LSVT. In: ICDAR, pp. 1557–1562 (2019)
Google Scholar
Wan, Z., He, M., Chen, H., Bai, X., Yao, C.: Textscanner: reading characters in order for robust scene text recognition. In: AAAI, pp. 12120–12127 (2020)
Google Scholar
Wan, Z., Zhang, J., Zhang, L., Luo, J., Yao, C.: On vocabulary reliance in scene text recognition. In: CVPR, pp. 11422–11431 (2020)
Google Scholar
Wang, K., Babenko, B., Belongie, S.J.: End-to-end scene text recognition. In: ICCV, pp. 1457–1464 (2011)
Google Scholar
Wang, L., Li, D., Zhu, Y., Tian, L., Shan, Y.: Dual super-resolution learning for semantic segmentation. In: CVPR, pp. 3773–3782 (2020)
Google Scholar
Wang, T., et al.: Decoupled attention network for text recognition. In: AAAI, pp. 12216–12224 (2020)
Google Scholar
Wang, Y., Lian, Z.: Exploring font-independent features for scene text recognition. In: MM, pp. 1900–1920 (2020)
Google Scholar
Wang, Y., Lian, Z., Tang, Y., Xiao, J.: Boosting scene character recognition by learning canonical forms of glyphs. Int. J. Doc. Anal. Recogn. (IJDAR) 22(3), 209–219 (2019). https://doi.org/10.1007/s10032-019-00326-z
Article Google Scholar
Xian, Y., Lampert, C.H., Schiele, B., Akata, Z.: Zero-shot learning - a comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 41(9), 2251–2265 (2019)
Article Google Scholar
Yan, R., Peng, L., Xiao, S., Yao, G.: Primitive representation learning for scene text recognition. In: CVPR, pp. 284–293 (2021)
Google Scholar
Yoshihashi, R., Shao, W., Kawakami, R., You, S., Iida, M., Naemura, T.: Classification-reconstruction learning for open-set recognition. In: CVPR, pp. 4016–4025 (2019)
Google Scholar
Yu, D., et al.: Towards accurate scene text recognition with semantic reasoning networks. In: CVPR, pp. 12110–12119 (2020)
Google Scholar
Yuan, T., Zhu, Z., Xu, K., Li, C., Mu, T., Hu, S.: A large Chinese text dataset in the wild. J. Comput. Sci. Technol. 34(3), 509–521 (2019)
Article Google Scholar
Zhang, C., Gupta, A., Zisserman, A.: Adaptive text recognition through visual matching. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12361, pp. 51–67. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58517-4_4
Chapter Google Scholar
Zhang, J., Du, J., Dai, L.: Radical analysis network for learning hierarchies of Chinese characters. Pattern Recognit. 103, 107305 (2020)
Article Google Scholar
Zhou, Z.H.: Open-environment machine learning. Nat. Sci. Rev. 9(8), nwac123 (2022)
Google Scholar

Download references

Acknowledgement

The research is supported by National Key Research and Development Program of China (2020AAA0109700), National Science Fund for Distinguished Young Scholars (62125601), National Natural Science Foundation of China (62076024, 62006018), Interdisciplinary Research Project for Young Teachers of USTB (Fundamental Research Funds for the Central Universities)(FRF-IDRY-21-018).

Author information

Authors and Affiliations

School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Chang Liu, Chun Yang & Xu-Cheng Yin
Institute of Artificial Intelligence, University of Science and Technology Beijing, Beijing, China
Xu-Cheng Yin

Authors

Chang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chun Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xu-Cheng Yin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xu-Cheng Yin .

Editor information

Editors and Affiliations

TU Dortmund University, Dortmund, Germany
Gernot A. Fink
Adobe, College Park, MN, USA
Rajiv Jain
Osaka Metropolitan University, Osaka, Japan
Koichi Kise
Rochester Institute of Technology, Rochester, NY, USA
Richard Zanibbi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, C., Yang, C., Yin, XC. (2023). Open-Set Text Recognition via Shape-Awareness Visual Reconstruction. In: Fink, G.A., Jain, R., Kise, K., Zanibbi, R. (eds) Document Analysis and Recognition - ICDAR 2023. ICDAR 2023. Lecture Notes in Computer Science, vol 14192. Springer, Cham. https://doi.org/10.1007/978-3-031-41731-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-41731-3_6
Published: 19 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41730-6
Online ISBN: 978-3-031-41731-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Open-Set Text Recognition via Shape-Awareness Visual Reconstruction