Synthesizing data for text recognition with style transfer

Li, Jiahui; Wang, Siwei; Wang, Yongtao; Tang, Zhi

doi:10.1007/s11042-018-6656-3

Synthesizing data for text recognition with style transfer

Published: 15 September 2018

Volume 78, pages 29183–29196, (2019)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Jiahui Li¹,
Siwei Wang¹,
Yongtao Wang¹ &
…
Zhi Tang¹

547 Accesses
5 Citations
6 Altmetric
Explore all metrics

Abstract

Most of the existing datasets for scene text recognition merely consist of a few thousand training samples with a very limited vocabulary, which cannot meet the requirement of the state-of-the-art deep learning based text recognition methods. Meanwhile, although the synthetic datasets (e.g., SynthText90k) usually contain millions of samples, they cannot fit the data distribution of the small target datasets in natural scenes completely. To address these problems, we propose a word data generating method called SynthText-Transfer, which is capable of emulating the distribution of the target dataset. SynthText-Transfer uses a style transfer method to generate samples with arbitray text content, which preserve the texture of the reference sample in the target dataset. The generated images are not only visibly similar with real images, but also capable of improving the accuracy of the state-of-the-art text recognition methods, especially for the English and Chinese dataset with a large alphabet (in which many characters only appear in few samples, making it hard to learn for sequence models). Moreover, the proposed method is fast and flexible, with a competitive speed among common style transfer methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Handwritten Text Generation with Character-Specific Encoding for Style Imitation

A Method for Scene Text Style Transfer

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

Article 05 January 2021

Canjie Luo, Qingxiang Lin, … Chunhua Shen

References

Chen T Q, Schmidt M (2016) Fast patch-based style transfer of arbitrary style. arXiv:1612.04337
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on, pp. 248–255. Ieee
Duck S Y (2016) Painter by numbers. https://www.kaggle.com/c/painter-by-numbers
Gatys L, Ecker A, Bethge M (2015) A neural algorithm of artistic style. Nature communications
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 369–376
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
He P, Huang W, Qiao Y, Loy C C, Tang X (2016) Reading scene text in deep convolutional sequences. In: AAAI, vol 16, pp 3501–3508
Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp 1510–1519
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda L G, Mestre S R, Mas J, Mota D F, Almazan J A, De Las Heras L P (2013) Icdar 2013 robust reading competition. In: Document analysis and recognition (ICDAR), 2013 12th international conference on. IEEE, pp 1484–1493
Lang K (1995) Newsweeder: Learning to filter netnews. In: Machine learning proceedings 1995. Elsevier, pp 331–339
Li C, Wand M (2016) Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2479–2486
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M H (2017) Universal style transfer via feature transforms. In: Advances in neural information processing systems, pp 386–396
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685
Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recogn (IJDAR) 7 (2-3):105–122
Article Google Scholar
Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference. BMVA
Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning
Risser E, Wilmot P, Barnes C (2017) Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv:1701.08893
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39(11):2298–2304
Article Google Scholar
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Smith R, Gu C, Lee D S, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S (2016) End-to-end interpretation of the french street name signs dataset. In: European conference on computer vision. Springer, pp 411–426
Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Computer vision (ICCV), 2017 IEEE international conference on. IEEE, pp 843–852
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: Computer vision (ICCV), 2011 IEEE international conference on. IEEE, pp 1457–1464
Wang Y, Bai X, Liu C-L (2018) Icpr mtwi 2018 challenge 1 text recognition of web images. https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100150.711.7.6ad52784HcABoy&raceId=231650&_lang=en_US
Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M, Do M N (2017) Semantic image inpainting with deep generative models. In: CVPR, vol 2, p 4

Download references

Acknowledgments

This work is supported by National Natural Science Foundation of China under Grant 61673029. This work is also a research achievement of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).

Author information

Authors and Affiliations

Institute of Computer Science and Technology, Peking University, Beijing, China
Jiahui Li, Siwei Wang, Yongtao Wang & Zhi Tang

Authors

Jiahui Li
View author publications
You can also search for this author in PubMed Google Scholar
Siwei Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yongtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yongtao Wang.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Jiahui Li and Siwei Wang made equal contributions to this paper

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, J., Wang, S., Wang, Y. et al. Synthesizing data for text recognition with style transfer. Multimed Tools Appl 78, 29183–29196 (2019). https://doi.org/10.1007/s11042-018-6656-3

Download citation

Received: 07 March 2018
Revised: 03 September 2018
Accepted: 05 September 2018
Published: 15 September 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s11042-018-6656-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthesizing data for text recognition with style transfer

Abstract

Access this article

Similar content being viewed by others

Handwritten Text Generation with Character-Specific Encoding for Style Imitation

A Method for Scene Text Style Transfer

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Synthesizing data for text recognition with style transfer

Abstract

Access this article

Similar content being viewed by others

Handwritten Text Generation with Character-Specific Encoding for Style Imitation

A Method for Scene Text Style Transfer

Separating Content from Style Using Adversarial Learning for Recognizing Text in the Wild

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation