Skip to main content
Log in

A Large Chinese Text Dataset in the Wild

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

In this paper, we introduce a very large Chinese text dataset in the wild. While optical character recognition (OCR) in document images is well studied and many commercial tools are available, the detection and recognition of text in natural images is still a challenging problem, especially for some more complicated character sets such as Chinese text. Lack of training data has always been a problem, especially for deep learning methods which require massive training data. In this paper, we provide details of a newly created dataset of Chinese text with about 1 million Chinese characters from 3 850 unique ones annotated by experts in over 30 000 street view images. This is a challenging dataset with good diversity containing planar text, raised text, text under poor illumination, distant text, partially occluded text, etc. For each character, the annotation includes its underlying character, bounding box, and six attributes. The attributes indicate the character’s background complexity, appearance, style, etc. Besides the dataset, we give baseline results using state-of-the-art methods for three tasks: character recognition (top-1 accuracy of 80.5%), character detection (AP of 70.9%), and text line detection (AED of 22.1). The dataset, source code, and trained models are publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Cui Y, Zhou F, Lin Y, Belongie S. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.1153-1162.

  2. Deng J, Dong W, Socher R, Li L J, Li K, L F F. ImageNet: A large-scale hierarchical image database. In Proc. the 22nd IEEE Conference on Computer Vision and Pattern Recognition, June 2009, pp.248-255.

  3. Lin T Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dolláisr P, Zitnick C L. Microsoft COCO: Common objects in context. In Proc. the 13th European Conference on Computer Vision, April 2014, pp.740-755.

  4. Zhou B, Zhao H, Puig X, Xiao T, Fidler S, Barriuso A, Torralba A. Semantic understanding of scenes through the ADE20K dataset. International Journal of Computer Vision, 2019, 127(3): 302-321.

    Article  Google Scholar 

  5. Lucas S M, Panaretos A, Sosa L et al. ICDAR 2003 robust reading competitions: Entries, results, and future directions. International Journal on Document Analysis and Recognition, 2005, 7(2/3): 105-122.

    Article  Google Scholar 

  6. Mishra A, Alahari K, Jawahar C V. Scene text recognition using higher order language priors. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 127.

  7. Smith R, Gu C, Lee D, Hu H, Unnikrishnan R, Ibarz J, Arnoud S, Lin S. End-to-end interpretation of the French Street Name Signs dataset. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.411-426.

  8. Veit A, Matera T, Neumann L, Matas J, Belongie S. COCO-text: Dataset and benchmark for text detection and recognition in natural images. arXiv:1601.07140, 2016. https://arxiv.org/abs/1601.07140, March 2019.

  9. de Campos T E, Babu B R, Varma M. Character recognition in natural images. In Proc. the 4th International Conference on Computer Vision Theory and Applications, February 2009, pp.273-280.

  10. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A. Synthetic data and artificial neural networks for natural scene text recognition. arXiv:1406.2227, 2014. https://arxiv.org/abs/1406.2227, March 2019.

  11. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. https://ai.google/research/pubs/pub37648, March 2019.

  12. Wang K, Babenko B, Belongie S J. End-to-end scene text recognition. In Proc. the 2011 International Conference on Computer Vision, November 2011, pp.1457-1464.

  13. Jung J, Lee S, Cho M S, Kim J H. Touch TT: Scene text extractor using touchscreen interface. Journal of Electronics and Telecommunications Research Institute, 2011, 33(1): 78-88.

    Google Scholar 

  14. Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.1083-1090.

  15. Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X. ICDAR2017 competition on reading Chinese text in the wild (RCTW-17). In Proc. the 14th IAPR International Conference on Document Analysis and Recognition, ovember 2017, pp.1429-1434.

  16. Epshtein B, Ofek E, Wexler Y. Detecting text in natural scenes with stroke width transform. In Proc. the 23rd IEEE Conference on Computer Vision and Pattern Recognition, June 2010, pp.2963-2970.

  17. Matas J, Chum O, Urban M, Pajdla T. Robust widebaseline stereo from maximally stable extremal regions. Image and Vision Computing, 2004, 22(10): 761-767.

    Article  Google Scholar 

  18. Chen H, Tsai S S, Schroth G, Chen D M, Grzeszczuk R, Girod B. Robust text detection in natural images with edgeenhanced Maximally Stable Extremal Regions. In Proc. the 18th IEEE International Conference on Image Processing, September 2011, pp.2609-2612.

  19. Koo H I, Kim D H. Scene text detection via connected component clustering and nontext filtering. IEEE Transactions Image Processing, 2013, 22(6): 2296-2305.

    Article  MathSciNet  MATH  Google Scholar 

  20. Neumann L, Matas J. A method for text localization and recognition in real-world images. In Proc. the 10th Asian Conference on Computer Vision, November 2011, pp.770-783.

  21. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X. Multioriented text detection with fully convolutional networks. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.4159-4167.

  22. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: An efficient and accurate scene text detector. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.2642-2651.

  23. He T, Huang W, Qiao Y, Yao J. Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423, 2016. https://arxiv.org/abs/1603.09423, March 2019.

  24. Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.56-72.

  25. Sheshadri K, Divvala S K. Exemplar driven character recognition in the wild. In Proc. the 2012 British Machine Vision Conference, September 2012, Article No. 13.

  26. Shi C, Wang C, Xiao B, Zhang Y, Gao S, Zhang Z. Scene text recognition using part-based tree-structured character detection. In Proc. the 26th IEEE Conference on Computer Vision and Pattern Recognition, June 2013, pp.2961-2968.

  27. Zhang D, Chang S F. A Bayesian framework for fusing multiple word knowledge models in videotext recognition. In Proc. the 2003 IEEE Conference on Computer Vision and Pattern Recognition, June 2003, pp.528-533.

  28. Mishra A, Alahari K, Jawahar C V. Top-down and bottomup cues for scene text recognition. In Proc. the 25th IEEE Conference on Computer Vision and Pattern Recognition, June 2012, pp.2687-2694.

  29. Lee S, Kim J. Complementary combination of holistic and component analysis for recognition of low-resolution video character images. Pattern Recognition Letters, 2008, 29(4): 383-391.

    Article  Google Scholar 

  30. Wang T, Wu D J, Coates A, Ng A Y. End-to-end text recognition with convolutional neural networks. In Proc. the 21st International Conference on Pattern Recognition, November 2012, pp.3304-3308.

  31. Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(11): 2298-2304.

    Article  Google Scholar 

  32. Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: A fast text detector with a single deep neural network. In Proc. the 31st AAAI Conference on Artificial Intelligence, February 2017, pp.4161-4167.

  33. Ye Q, Doermann D. Text detection and recognition in imagery: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(7): 1480-1500.

    Article  Google Scholar 

  34. Zhu Z, Liang D, Zhang S, Huang X, Li B, Hu S. Traffic-sign detection and classification in the wild. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.2110-2118.

  35. Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks. In Proc. the 26th Annual Conference on Neural Information Processing Systems, December 2012, pp.1106-1114.

  36. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Le-Cun Y. OverFeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013. https://arxiv.org/abs/1312.6229, March 2019.

  37. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S E, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions. In Proc. the 28th IEEE Conference on Computer Vision and Pattern Recognition, June 2015, pp.1-9.

  38. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In Proc. the 29th IEEE Conference on Computer Vision and Pattern Recognition, June 2016, pp.770-778.

  39. Everingham M, Eslami S A, Van Gool L, Williams C K, Winn J, Zisserman A. The PASCAL Visual Object Classes challenge: A retrospective. International Journal of Computer Vision, 2015, 111(1): 98-136.

  40. Redmon J, Farhadi A. YOLO9000: Better, faster, stronger. In Proc. the 30th IEEE Conference on Computer Vision and Pattern Recognition, July 2017, pp.6517-6525.

  41. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: Single shot multibox detector. In Proc. the 14th European Conference on Computer Vision, October 2016, pp.21-37.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tai-Jiang Mu.

Electronic supplementary material

ESM 1

(PDF 697 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, TL., Zhu, Z., Xu, K. et al. A Large Chinese Text Dataset in the Wild. J. Comput. Sci. Technol. 34, 509–521 (2019). https://doi.org/10.1007/s11390-019-1923-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-019-1923-y

Keywords

Navigation