Case Study of Few-Shot Learning in Text Recognition Models

Wang, Jianzong; Si, Shijing; Hong, Zhenhou; Qu, Xiaoyang; Zhu, Xinghua; Xiao, Jing

doi:10.1007/978-3-030-91560-5_29

Jianzong Wang¹²,
Shijing Si¹²,
Zhenhou Hong¹²,
Xiaoyang Qu¹²,
Xinghua Zhu¹² &
…
Jing Xiao¹²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13081))

Included in the following conference series:

International Conference on Web Information Systems Engineering

1095 Accesses
1 Citations

Abstract

Optical text recognition models are widely applied in document processing systems. However, a high-quality text recognition model usually requires large number of samples, extensive amount of time and computation resources. In this paper, we propose a few-shot learning framework for unsegmented text recognition, which comprises of a conventional encoder-decoder recognition module, as well as a generative module for convolutional feature generation. In the meta-training stage, a base model for general text recognition and feature vector generation is trained with large synthesized text image dataset. In the meta-testing stage, the base model is adjusted with a small number of authentic samples. With the complementation of synthesized feature vectors, the base model is adapted to the target dataset distribution. The proposed framework only requires a few authentic samples. It is both data- and time- efficient in adapting existing models to new target datasets. Experimental results on authentic datasets used in industrial applications show that the proposed meta-testing approach outperforms conventional transfer learning by up to 5.84%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, X., Jin, L., Zhu, Y., Luo, C., Wang, T.: Text recognition in the wild: a survey. ACM Comput. Surv. (CSUR) 54(2), 1–35 (2021)
Article Google Scholar
de Sousa Neto, A.F., Bezerra, B.L.D., Toselli, A.H., Lima, E.B.: Htr-flor++ a handwritten text recognition system based on a pipeline of optical and language models. In: Proceedings of the ACM Symposium on Document Engineering 2020, pp. 1–4 (2020)
Google Scholar
Hariharan, B., Girshick, R.: Low-shot visual recognition by shrinking and hallucinating features. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3018–3027 (2017)
Google Scholar
Jaderberg, M., Simonyan, K., Vedaldi, A., Zisserman, A.: Synthetic data and artificial neural networks for natural scene text recognition. In: Proceedings of the Conference on Neural Information Processing Systems (2014)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Proceedings of the International Conference on Learning Representations (2015)
Google Scholar
Luo, C., Zhu, Y., Jin, L., Wang, Y.: Learn to augment: Joint data augmentation and network optimization for text recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13746–13755 (2020)
Google Scholar
Rey-Area, M., Guirado, E., Tabik, S., Ruiz-Hidalgo, J.: Fucitnet: improving the generalization of deep learning networks by the fusion of learned class-inherent transformations. Inf. Fusion 63, 188–195 (2020)
Article Google Scholar
Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. Pattern Anal. Mach. Intell. 39(11), 2298–2304 (2017)
Article Google Scholar
Yousef, M., Hussain, K.F., Mohammed, U.S.: Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recogn. 108, 107482 (2020)
Article Google Scholar
Zharikov, I., Nikitin, P., Vasiliev, I., Dokholyan, V.: Ddi-100: Dataset for text detection and recognition. In: Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control, pp. 1–5 (2020)
Google Scholar

Download references

Acknowledgment

This work is supported by National Key Research and Development Program of China under grant No.2018YFB0204403. Corresponding author is Shijing Si from Ping An Technology (Shenzhen) Co., Ltd.

Author information

Authors and Affiliations

Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
Jianzong Wang, Shijing Si, Zhenhou Hong, Xiaoyang Qu, Xinghua Zhu & Jing Xiao

Authors

Jianzong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shijing Si
View author publications
You can also search for this author in PubMed Google Scholar
Zhenhou Hong
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyang Qu
View author publications
You can also search for this author in PubMed Google Scholar
Xinghua Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Jing Xiao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computer Science and Engineering, The University of New South Wales, Sydney, NSW, Australia
Wenjie Zhang
Peking University, Beijing, China
Lei Zou
Zayed University, Dubai, United Arab Emirates
Zakaria Maamar
Swinburne University of Technology, Melbourne, VIC, Australia
Lu Chen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, J., Si, S., Hong, Z., Qu, X., Zhu, X., Xiao, J. (2021). Case Study of Few-Shot Learning in Text Recognition Models. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13081. Springer, Cham. https://doi.org/10.1007/978-3-030-91560-5_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-91560-5_29
Published: 01 January 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-91559-9
Online ISBN: 978-3-030-91560-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics