Skip to main content
Log in

N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

The convolutional recurrent neural network is one of the most popular text recognition methods. Recurrent structures can extract long-term dependencies, but they are time consuming in computation compared with convolutional structures. We argue that the Chinese text line recognition can be performed based on neighbor rather than entire contextual information, and the information extracted from neighborhoods should only be a supplement to the information extracted from character regions. Therefore, we propose a novel neighborhoods based fully convolutional text recognition network (N-FTRN). It first extracts character-level feature sequences from text lines, then uses residual blocks instead of the recurrent structure to utilize contextual information. A reshape layer is applied to enable the network to recognize both vertical and horizontal text lines. Extensive experiments have been conducted to validate the efficiency and effectiveness of the proposed network. Compared with the state-of-the-art methods, we achieve comparable recognition performances on a Chinese scene text competition dataset (TRW) in ICDAR 2015 with much more compact models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Computer Science

  2. Bartz C, Yang H, Meinel C (2017) SEE: Towards Semi-Supervised End-to-End Scene Text Recognition ArXiv e-prints

  3. Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 71–79. https://doi.org/10.1145/3219819.3219861

  4. Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: Towards accurate text recognition in natural images. ArXiv e-prints

  5. Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: Towards arbitrarily-oriented text recognition. In: 2018 IEEE Conference on computer vision and pattern recognition (CVPR)

  6. Gao Y, Chen Y, Wang J, Lu H (2017) Reading scene text with attention convolutional sequence modeling. ArXiv e-prints

  7. Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610

    Article  Google Scholar 

  8. Graves A, Gomez F (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International conference on machine learning, pp 369–376

  9. Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855

    Article  Google Scholar 

  10. Graves A (2012) Offline arabic handwriting recognition with multidimensional recurrent neural networks. Advances in Neural Information Processing Systems, pp 545–552

  11. Grosicki E, Abed HE (2009) Icdar 2009 handwriting recognition competition. In: International conference on document analysis and recognition, pp 1398–1402

  12. He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 00, pp 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  13. He P, Huang W, Qiao Y, Chen CL, Tang X (2016) Reading scene text in deep convolutional sequences. In: Thirtieth AAAI conference on artificial intelligence, pp 3501–3508

  14. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 3(4):212–223

    Google Scholar 

  15. Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: 2014 IEEE International conference on image processing (ICIP), pp 3087–3091. https://doi.org/10.1109/ICIP.2014.7025624

  16. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition Eprint Arxiv

  17. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. Springer International Publishing, Berlin

    Book  Google Scholar 

  18. Liu CL, Koga M, Fujisawa H (2004) Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Trans Pattern Anal Mach Intell 24(11):1425– 1437

    Google Scholar 

  19. Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans Multimed 14(2):482–489. https://doi.org/10.1109/TMM.2011.2177646

    Article  MathSciNet  Google Scholar 

  20. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37

  21. Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: Fast Oriented Text Spotting with a Unified Network ArXiv e-prints

  22. Messina R, Louradour J (2015) Segmentation-free handwritten chinese text recognition with lstm-rnn. In: International conference on document analysis and recognition, pp 171–175

  23. Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Article  Google Scholar 

  24. Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification, pp 4168–4176

  25. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298

    Article  Google Scholar 

  26. Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) ArXiv e-prints

  27. Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis & Machine Intelligence

  28. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science

  29. Su T, Zhang T, Guan D, Huang H (2009) Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recogn 42:167–182

    Article  MATH  Google Scholar 

  30. Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: International conference on pattern recognition, pp 3304–3308

  31. Wu YC, Yin F, Liu CL (2017) Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn 65(C):251–264

    Article  Google Scholar 

  32. Wu YC, Yin F, Zhang XY, Liu L, Liu CL (2018) Scan: Sliding convolutional attention network for scene text recognition. ArXiv e-prints

  33. Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Twenty-sixth international joint conference on artificial intelligence, pp 3133–3139

  34. Xie Z, Sun Z, Jin L, Feng Z, Zhang S (2017) Fully convolutional recurrent network for handwritten chinese text recognition. In: International conference on pattern recognition, pp 4011–4016

  35. Xie Z, Sun Z, Jin L, Ni H, Lyons T (2018) Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, pp 1903–1917

  36. Xu L, Yin F, Wang QF, Liu CL (2014) An over-segmentation method for single-touching chinese handwriting with learning-based filtering. Int J Doc Anal Recogn 17(1):91–104

    Article  Google Scholar 

  37. Yangqing J, Evan S, Jeff D, Sergey K, Jonathan L (2014) Caffe: Convolutional architecture for fast feature embedding. Eprint Arxiv, pp 675–678

  38. Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500

    Article  Google Scholar 

  39. Yin F, Wu YC, Zhang XY, Liu CL (2017) Scene text recognition with sliding convolutional character models. ArXiv e-prints

  40. Zhou X, Zhou S, Yao C, Cao Z, Yin Q (2015) Icdar 2015 text reading in the wild competition. Computer Science

  41. Zhu L, Shen J, Xie L (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486

    Article  Google Scholar 

  42. Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Transactions on Cybernetics

Download references

Acknowledgments

This work is supported by National Key R&D Program of China under contract No. 2017YFB1002203, and NSFC Key Projects of International (Regional) Cooperation and Exchanges under Grant 61860206004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weiqiang Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Wang, W. & Lv, K. N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition. Multimed Tools Appl 78, 22249–22268 (2019). https://doi.org/10.1007/s11042-019-7410-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7410-1

Keywords

Navigation