Abstract
Most of the existing datasets for scene text recognition merely consist of a few thousand training samples with a very limited vocabulary, which cannot meet the requirement of the state-of-the-art deep learning based text recognition methods. Meanwhile, although the synthetic datasets (e.g., SynthText90k) usually contain millions of samples, they cannot fit the data distribution of the small target datasets in natural scenes completely. To address these problems, we propose a word data generating method called SynthText-Transfer, which is capable of emulating the distribution of the target dataset. SynthText-Transfer uses a style transfer method to generate samples with arbitray text content, which preserve the texture of the reference sample in the target dataset. The generated images are not only visibly similar with real images, but also capable of improving the accuracy of the state-of-the-art text recognition methods, especially for the English and Chinese dataset with a large alphabet (in which many characters only appear in few samples, making it hard to learn for sequence models). Moreover, the proposed method is fast and flexible, with a competitive speed among common style transfer methods.






Similar content being viewed by others
References
Chen T Q, Schmidt M (2016) Fast patch-based style transfer of arbitrary style. arXiv:1612.04337
Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database Computer vision and pattern recognition, 2009. CVPR 2009. IEEE conference on, pp. 248–255. Ieee
Duck S Y (2016) Painter by numbers. https://www.kaggle.com/c/painter-by-numbers
Gatys L, Ecker A, Bethge M (2015) A neural algorithm of artistic style. Nature communications
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the 23rd international conference on machine learning. ACM, pp 369–376
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2315–2324
He P, Huang W, Qiao Y, Loy C C, Tang X (2016) Reading scene text in deep convolutional sequences. In: AAAI, vol 16, pp 3501–3508
Huang X, Belongie S J (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV, pp 1510–1519
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227
Karatzas D, Shafait F, Uchida S, Iwamura M, i Bigorda L G, Mestre S R, Mas J, Mota D F, Almazan J A, De Las Heras L P (2013) Icdar 2013 robust reading competition. In: Document analysis and recognition (ICDAR), 2013 12th international conference on. IEEE, pp 1484–1493
Lang K (1995) Newsweeder: Learning to filter netnews. In: Machine learning proceedings 1995. Elsevier, pp 331–339
Li C, Wand M (2016) Combining markov random fields and convolutional neural networks for image synthesis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2479–2486
Li Y, Fang C, Yang J, Wang Z, Lu X, Yang M H (2017) Universal style transfer via feature transforms. In: Advances in neural information processing systems, pp 386–396
Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C L (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5676–5685
Lucas S M, Panaretos A, Sosa L, Tang A, Wong S, Young R, Ashida K, Nagai H, Okamoto M, Yamamoto H et al (2005) Icdar 2003 robust reading competitions: entries, results, and future directions. Int J Doc Anal Recogn (IJDAR) 7 (2-3):105–122
Mishra A, Alahari K, Jawahar C (2012) Scene text recognition using higher order language priors. In: BMVC-British machine vision conference. BMVA
Ravi S, Larochelle H (2016) Optimization as a model for few-shot learning
Risser E, Wilmot P, Barnes C (2017) Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv:1701.08893
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence 39(11):2298–2304
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4168–4176
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Smith R, Gu C, Lee D S, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S (2016) End-to-end interpretation of the french street name signs dataset. In: European conference on computer vision. Springer, pp 411–426
Sun C, Shrivastava A, Singh S, Gupta A (2017) Revisiting unreasonable effectiveness of data in deep learning era. In: Computer vision (ICCV), 2017 IEEE international conference on. IEEE, pp 843–852
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: Computer vision (ICCV), 2011 IEEE international conference on. IEEE, pp 1457–1464
Wang Y, Bai X, Liu C-L (2018) Icpr mtwi 2018 challenge 1 text recognition of web images. https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100150.711.7.6ad52784HcABoy&raceId=231650&_lang=en_US
Yeh R A, Chen C, Lim T Y, Schwing A G, Hasegawa-Johnson M, Do M N (2017) Semantic image inpainting with deep generative models. In: CVPR, vol 2, p 4
Acknowledgments
This work is supported by National Natural Science Foundation of China under Grant 61673029. This work is also a research achievement of Key Laboratory of Science, Technology and Standard in Press Industry (Key Laboratory of Intelligent Press Media Technology).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Jiahui Li and Siwei Wang made equal contributions to this paper
Rights and permissions
About this article
Cite this article
Li, J., Wang, S., Wang, Y. et al. Synthesizing data for text recognition with style transfer. Multimed Tools Appl 78, 29183–29196 (2019). https://doi.org/10.1007/s11042-018-6656-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6656-3