Abstract
Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive text detection method is proposed in this paper. Usually, texts in natural scenes are not alone and arranged into lines for easy reading. To find all possible text lines in an image, candidate text lines are obtained using text edge box and conventional neural network at first. Then, to accurately find out the true text lines in the image, these candidate text lines are analyzed in a hierarchical recursive architecture. For each of them, connected components segmentation and hierarchical random field based analysis are recursively employed until the detected text line no more changes. Now the detected text lines are output as the text detection result. Experiments on ICDAR 2003 dataset, ICDAR 2013 dataset and Street View Dataset show that the hierarchical recursive architecture can improve text detection performance and the proposed method achieves the state-of-art in scene text detection.














Similar content being viewed by others
References
Bengio Y (2009) Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1):1–27
Boykov Y, Kolmogorov V (2004) An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Trans Pattern Anal Mach Intell 26(9):1124–1137
Cabrera CR, Sastre RJ, Rodriguez JA, Bascon SM (2012) Surfing the point clouds: Selective 3D spatial pyramids for category-level object recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3458–3465
Chang CC, Lin CJ (2011) LIBSVM: A Library for Support Vector Machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27
Chen XR, Yuille AL (2004) Detecting and Reading Text in Natural Scenes. Proc. IEEE Conf. on Computer Vison and Pattern Recognition, pp II366-II373
De Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proceedings of the 4th International Conference on Computer Vision Theory and Applications, pp 273–280
Dollr P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570
Epshtein B, Ofek E, Wexler Y (2010) Detecting Text in Natural Scenes with Stroke Width Transform. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 2963–2970
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading Text in the Wild with Convolutional Neural Networks. Int J Comput Vis 116(1):1–20
Karatzas D, Shafaity F, Uchidaz S, Iwamurax M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras JP (2013) ICDAR 2013 Robust Reading Competition. In: Proceedings of the twelfth International Conference on Document Analysis and Recognition, pp 1484–1493
Kohli P, Ladicky L, Torr PH (2009) Robust Higher Order Potentials for Enforcing Label Consistency. Int J Comput Vis 82:302–324
Ladicky L, Russell C, Kohli P, Torr PH (2014) Associative hierarchical random fields. IEEE Trans Pattern Anal Mach Intell 36(6):1056–1077
Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp 682–687
Mariano V Y, Min J, Park J H, Kasturi R, Mihalcik D, Li H P, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithms. In: Proceedings of the 16th International Conference on Pattern Recognition, pp 965–969
Minetto R, Thome N, Cord M, Leite NJ, Stolfi J (2013) T-HOG: An effective gradient-based descriptor for single line text regions. Pattern Recogn 46 (3):1078–1090
Neumann L, Matas J (2011) Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, pp 687–691
Neumann L, Matas J (2012) Real-Time Scene Text Localization and Recognition. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 3538–3545
Neumann L, Matas J (2015) Efficient Scene Text Localization and Recognition with Local Character Refinement. IEEE Conf. on Computer Vison and Pattern Recognition
Opitz M, Diem M, Fiel S, Kleber F, Sablatnig R (2014) End-to-End Text Recognition using Local Ternary Patterns, MSER and Deep Convolutional Nets. In: Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, pp 186–190
Pan YF, Hou XW, Liu CL (2011) A Hybrid Approach to Detect and Localize Texts in Natural Scene Images. IEEE Trans Image Process 20(3):800–813
Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In: Proceedings of the 20th ACM international conference on Multimedia, pp 765–768
Shahab A, Shafait F (2011) Dengel A (2011) ICDAR Robust Reading Competition Challenge 2: Reading Text in Scene Images. In: Proceedings of the eleventh International Conference on Document Analysis and Recognition, pp 1491–1496
Vedaldi A, Lenc K (2015) MatConvNet: Convolutional neural networks for MATLAB. In: Proceedings of the 2015 ACM Multimedia Conferenc, pp 689–692
Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: Proceedings of the 13th IEEE International Conference on Computer Vision, pp 1457–1464
Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal of Document Analysis 8 (4):280–296
Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-End Text Recognition with Convolutional Neural Networks. In: Proceedings of the 21st International Conference on Pattern Recognition, pp 3304– 3308
Wang S, Yang Y, Ma ZG, Li X, Pang CY, Hauptmann AG (2012) Action recognition by exploring data distribution and feature correlation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1370–1377
Wang XB, Song YH, Zhang YL (2013) Natural Scene Text Detection with Multi-channel Connected Component Segmentation. In: Proceedings of the twelfth International Conference on Document Analysis and Recognition, pp 1375–1379
Wang XB, Song YH, Yuan LZ, Xin JM (2015) Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis. Pattern Recogn Lett 60-61:41–47
Yang Y, Ma ZG, Xu ZW, Yan SC, Hauptmann AG (2013) How Related Exemplars Help Complex Event Detection in Web Videos?. In: Proceedings of IEEE International Conference on Computer Vision, pp 2104–2111
Yao C, Bai X, Liu WY, Ma Y, Tu ZW (2012) Detecting Texts of Arbitrary Orientations in Natural Images. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 1083–1090
Ye QX, Gao W, Wang WQ, Zeng W (2003) A robust text detection algorithm in images and video frames. In: Proceedings of the 2003 Joint Conference of the 4th International Conference on Information, Communications and Signal Processing and 4th Pacific-Rim Conference on Multimedia, vol 2, pp 802–806
Yin XC, Yin YW, Huang KZ, Hao HW (2014) Robust Text Detection in Natural Scene Images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983
Yu C, Song Y H, Zhang Y L, Liu Y (2016) Scene text localization using edge analysis and feature pool. Neurocomputing 175:625–661
Yuan J, Wei B G, Liu Y H, Zhang Y, Wang L D (2015) A method for text line detection in natural images. Multimedia Tools and Applications 74(3):859–884
Zitnick CL, Dollar P (2014) Edge Boxes: Locating Object Proposals from Edges. In: Proceedings of the 13th European Conference on Computer Vision part V Lecture Notes in Computer Science, pp 391–405
Zhu KH, Qi FH, Jiang RJ, Xu L (2007) Automatic character detection and segmentation in natural scene images. J Zheijang Univ Sci A 8(1):63–71
Acknowledgments
This work is supported by the National Natural Science Foundation of China (91520301).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, X., Song, Y., Zhang, Y. et al. A hierarchical recursive method for text detection in natural scene images. Multimed Tools Appl 76, 26201–26223 (2017). https://doi.org/10.1007/s11042-016-4099-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-4099-2