Abstract
The convolutional recurrent neural network is one of the most popular text recognition methods. Recurrent structures can extract long-term dependencies, but they are time consuming in computation compared with convolutional structures. We argue that the Chinese text line recognition can be performed based on neighbor rather than entire contextual information, and the information extracted from neighborhoods should only be a supplement to the information extracted from character regions. Therefore, we propose a novel neighborhoods based fully convolutional text recognition network (N-FTRN). It first extracts character-level feature sequences from text lines, then uses residual blocks instead of the recurrent structure to utilize contextual information. A reshape layer is applied to enable the network to recognize both vertical and horizontal text lines. Extensive experiments have been conducted to validate the efficiency and effectiveness of the proposed network. Compared with the state-of-the-art methods, we achieve comparable recognition performances on a Chinese scene text competition dataset (TRW) in ICDAR 2015 with much more compact models.
Similar content being viewed by others
References
Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Computer Science
Bartz C, Yang H, Meinel C (2017) SEE: Towards Semi-Supervised End-to-End Scene Text Recognition ArXiv e-prints
Borisyuk F, Gordo A, Sivakumar V (2018) Rosetta: Large scale system for text detection and recognition in images. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp 71–79. https://doi.org/10.1145/3219819.3219861
Cheng Z, Bai F, Xu Y, Zheng G, Pu S, Zhou S (2017) Focusing attention: Towards accurate text recognition in natural images. ArXiv e-prints
Cheng Z, Xu Y, Bai F, Niu Y, Pu S, Zhou S (2018) Aon: Towards arbitrarily-oriented text recognition. In: 2018 IEEE Conference on computer vision and pattern recognition (CVPR)
Gao Y, Chen Y, Wang J, Lu H (2017) Reading scene text with attention convolutional sequence modeling. ArXiv e-prints
Graves A, Schmidhuber J (2005) Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Netw 18(5):602–610
Graves A, Gomez F (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International conference on machine learning, pp 369–376
Graves A, Liwicki M, Fernandez S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855
Graves A (2012) Offline arabic handwriting recognition with multidimensional recurrent neural networks. Advances in Neural Information Processing Systems, pp 545–552
Grosicki E, Abed HE (2009) Icdar 2009 handwriting recognition competition. In: International conference on document analysis and recognition, pp 1398–1402
He K, Zhang X, Ren S, Sun J Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), vol 00, pp 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He P, Huang W, Qiao Y, Chen CL, Tang X (2016) Reading scene text in deep convolutional sequences. In: Thirtieth AAAI conference on artificial intelligence, pp 3501–3508
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural networks by preventing co-adaptation of feature detectors. Comput Sci 3(4):212–223
Huang S, Wang W, Zhang H (2014) Retrieving images using saliency detection and graph matching. In: 2014 IEEE International conference on image processing (ICIP), pp 3087–3091. https://doi.org/10.1109/ICIP.2014.7025624
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2014) Synthetic data and artificial neural networks for natural scene text recognition Eprint Arxiv
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. Springer International Publishing, Berlin
Liu CL, Koga M, Fujisawa H (2004) Lexicon-driven segmentation and recognition of handwritten character strings for japanese address reading. IEEE Trans Pattern Anal Mach Intell 24(11):1425– 1437
Liu X, Wang W (2012) Robustly extracting captions in videos based on stroke-like edges and spatio-temporal analysis. IEEE Trans Multimed 14(2):482–489. https://doi.org/10.1109/TMM.2011.2177646
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: Fast Oriented Text Spotting with a Unified Network ArXiv e-prints
Messina R, Louradour J (2015) Segmentation-free handwritten chinese text recognition with lstm-rnn. In: International conference on document analysis and recognition, pp 171–175
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Shi B, Wang X, Lyu P, Yao C, Bai X (2016) Robust scene text recognition with automatic rectification, pp 4168–4176
Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298
Shi B, Yao C, Liao M, Yang M, Xu P, Cui L, Belongie S, Lu S, Bai X (2017) ICDAR2017 Competition on Reading Chinese Text in the Wild (RCTW-17) ArXiv e-prints
Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X (2018) Aster: An attentional scene text recognizer with flexible rectification. IEEE Transactions on Pattern Analysis & Machine Intelligence
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. Computer Science
Su T, Zhang T, Guan D, Huang H (2009) Off-line recognition of realistic chinese handwriting using segmentation-free strategy. Pattern Recogn 42:167–182
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-end text recognition with convolutional neural networks. In: International conference on pattern recognition, pp 3304–3308
Wu YC, Yin F, Liu CL (2017) Improving handwritten chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recogn 65(C):251–264
Wu YC, Yin F, Zhang XY, Liu L, Liu CL (2018) Scan: Sliding convolutional attention network for scene text recognition. ArXiv e-prints
Xie L, Shen J, Han J, Zhu L, Shao L (2017) Dynamic multi-view hashing for online image retrieval. In: Twenty-sixth international joint conference on artificial intelligence, pp 3133–3139
Xie Z, Sun Z, Jin L, Feng Z, Zhang S (2017) Fully convolutional recurrent network for handwritten chinese text recognition. In: International conference on pattern recognition, pp 4011–4016
Xie Z, Sun Z, Jin L, Ni H, Lyons T (2018) Learning spatial-semantic context with fully convolutional recurrent network for online handwritten chinese text recognition. IEEE Transactions on Pattern Analysis & Machine Intelligence, pp 1903–1917
Xu L, Yin F, Wang QF, Liu CL (2014) An over-segmentation method for single-touching chinese handwriting with learning-based filtering. Int J Doc Anal Recogn 17(1):91–104
Yangqing J, Evan S, Jeff D, Sergey K, Jonathan L (2014) Caffe: Convolutional architecture for fast feature embedding. Eprint Arxiv, pp 675–678
Ye Q, Doermann D (2015) Text detection and recognition in imagery: a survey. IEEE Trans Pattern Anal Mach Intell 37(7):1480–1500
Yin F, Wu YC, Zhang XY, Liu CL (2017) Scene text recognition with sliding convolutional character models. ArXiv e-prints
Zhou X, Zhou S, Yao C, Cao Z, Yin Q (2015) Icdar 2015 text reading in the wild competition. Computer Science
Zhu L, Shen J, Xie L (2017) Unsupervised visual hashing with semantic assistant for content-based image retrieval. IEEE Trans Knowl Data Eng 29(2):472–486
Zhu L, Shen J, Xie L, Cheng Z (2017) Unsupervised topic hypergraph hashing for efficient mobile image retrieval. IEEE Transactions on Cybernetics
Acknowledgments
This work is supported by National Key R&D Program of China under contract No. 2017YFB1002203, and NSFC Key Projects of International (Regional) Cooperation and Exchanges under Grant 61860206004.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, H., Wang, W. & Lv, K. N-FTRN: Neighborhoods based fully convolutional network for Chinese text line recognition. Multimed Tools Appl 78, 22249–22268 (2019). https://doi.org/10.1007/s11042-019-7410-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7410-1