Skip to main content
Log in

A hierarchical recursive method for text detection in natural scene images

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Text detection in natural scene images is a challenging problem in computer vision. To robust detect various texts in complex scenes, a hierarchical recursive text detection method is proposed in this paper. Usually, texts in natural scenes are not alone and arranged into lines for easy reading. To find all possible text lines in an image, candidate text lines are obtained using text edge box and conventional neural network at first. Then, to accurately find out the true text lines in the image, these candidate text lines are analyzed in a hierarchical recursive architecture. For each of them, connected components segmentation and hierarchical random field based analysis are recursively employed until the detected text line no more changes. Now the detected text lines are output as the text detection result. Experiments on ICDAR 2003 dataset, ICDAR 2013 dataset and Street View Dataset show that the hierarchical recursive architecture can improve text detection performance and the proposed method achieves the state-of-art in scene text detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Bengio Y (2009) Learning Deep Architectures for AI. Foundations and Trends in Machine Learning 2(1):1–27

    Article  MathSciNet  MATH  Google Scholar 

  2. Boykov Y, Kolmogorov V (2004) An Experimental Comparison of Min-Cut/Max-Flow Algorithms for Energy Minimization in Vision. IEEE Trans Pattern Anal Mach Intell 26(9):1124–1137

    Article  MATH  Google Scholar 

  3. Cabrera CR, Sastre RJ, Rodriguez JA, Bascon SM (2012) Surfing the point clouds: Selective 3D spatial pyramids for category-level object recognition. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 3458–3465

  4. Chang CC, Lin CJ (2011) LIBSVM: A Library for Support Vector Machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

    Article  Google Scholar 

  5. Chen XR, Yuille AL (2004) Detecting and Reading Text in Natural Scenes. Proc. IEEE Conf. on Computer Vison and Pattern Recognition, pp II366-II373

  6. De Campos TE, Babu BR, Varma M (2009) Character recognition in natural images. In: Proceedings of the 4th International Conference on Computer Vision Theory and Applications, pp 273–280

  7. Dollr P, Zitnick CL (2015) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570

    Article  Google Scholar 

  8. Epshtein B, Ofek E, Wexler Y (2010) Detecting Text in Natural Scenes with Stroke Width Transform. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 2963–2970

  9. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading Text in the Wild with Convolutional Neural Networks. Int J Comput Vis 116(1):1–20

    Article  MathSciNet  Google Scholar 

  10. Karatzas D, Shafaity F, Uchidaz S, Iwamurax M, Bigorda LG, Mestre SR, Mas J, Mota DF, Almazan JA, De Las Heras JP (2013) ICDAR 2013 Robust Reading Competition. In: Proceedings of the twelfth International Conference on Document Analysis and Recognition, pp 1484–1493

  11. Kohli P, Ladicky L, Torr PH (2009) Robust Higher Order Potentials for Enforcing Label Consistency. Int J Comput Vis 82:302–324

    Article  Google Scholar 

  12. Ladicky L, Russell C, Kohli P, Torr PH (2014) Associative hierarchical random fields. IEEE Trans Pattern Anal Mach Intell 36(6):1056–1077

    Article  Google Scholar 

  13. Lucas SM, Panaretos A, Sosa L, Tang A, Wong S, Young R (2003) ICDAR 2003 robust reading competitions. In: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp 682–687

  14. Mariano V Y, Min J, Park J H, Kasturi R, Mihalcik D, Li H P, Doermann D, Drayer T (2002) Performance evaluation of object detection algorithms. In: Proceedings of the 16th International Conference on Pattern Recognition, pp 965–969

  15. Minetto R, Thome N, Cord M, Leite NJ, Stolfi J (2013) T-HOG: An effective gradient-based descriptor for single line text regions. Pattern Recogn 46 (3):1078–1090

    Article  Google Scholar 

  16. Neumann L, Matas J (2011) Text Localization in Real-world Images using Efficiently Pruned Exhaustive Search. In: Proceedings of the 11th International Conference on Document Analysis and Recognition, pp 687–691

  17. Neumann L, Matas J (2012) Real-Time Scene Text Localization and Recognition. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 3538–3545

  18. Neumann L, Matas J (2015) Efficient Scene Text Localization and Recognition with Local Character Refinement. IEEE Conf. on Computer Vison and Pattern Recognition

  19. Opitz M, Diem M, Fiel S, Kleber F, Sablatnig R (2014) End-to-End Text Recognition using Local Ternary Patterns, MSER and Deep Convolutional Nets. In: Proceedings of the 11th IAPR International Workshop on Document Analysis Systems, pp 186–190

  20. Pan YF, Hou XW, Liu CL (2011) A Hybrid Approach to Detect and Localize Texts in Natural Scene Images. IEEE Trans Image Process 20(3):800–813

    Article  MathSciNet  MATH  Google Scholar 

  21. Phan TQ, Shivakumara P, Tan CL (2012) Detecting text in the real world. In: Proceedings of the 20th ACM international conference on Multimedia, pp 765–768

  22. Shahab A, Shafait F (2011) Dengel A (2011) ICDAR Robust Reading Competition Challenge 2: Reading Text in Scene Images. In: Proceedings of the eleventh International Conference on Document Analysis and Recognition, pp 1491–1496

  23. Vedaldi A, Lenc K (2015) MatConvNet: Convolutional neural networks for MATLAB. In: Proceedings of the 2015 ACM Multimedia Conferenc, pp 689–692

  24. Wang K, Babenko B, Belongie S (2011) End-to-end scene text recognition. In: Proceedings of the 13th IEEE International Conference on Computer Vision, pp 1457–1464

  25. Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. International Journal of Document Analysis 8 (4):280–296

  26. Wang T, Wu DJ, Coates A, Ng AY (2012) End-to-End Text Recognition with Convolutional Neural Networks. In: Proceedings of the 21st International Conference on Pattern Recognition, pp 3304– 3308

  27. Wang S, Yang Y, Ma ZG, Li X, Pang CY, Hauptmann AG (2012) Action recognition by exploring data distribution and feature correlation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 1370–1377

  28. Wang XB, Song YH, Zhang YL (2013) Natural Scene Text Detection with Multi-channel Connected Component Segmentation. In: Proceedings of the twelfth International Conference on Document Analysis and Recognition, pp 1375–1379

  29. Wang XB, Song YH, Yuan LZ, Xin JM (2015) Natural scene text detection with multi-layer segmentation and higher order conditional random field based analysis. Pattern Recogn Lett 60-61:41–47

    Article  Google Scholar 

  30. Yang Y, Ma ZG, Xu ZW, Yan SC, Hauptmann AG (2013) How Related Exemplars Help Complex Event Detection in Web Videos?. In: Proceedings of IEEE International Conference on Computer Vision, pp 2104–2111

  31. Yao C, Bai X, Liu WY, Ma Y, Tu ZW (2012) Detecting Texts of Arbitrary Orientations in Natural Images. In: Proceedings IEEE Conf. on Computer Vison and Pattern Recognition, pp 1083–1090

  32. Ye QX, Gao W, Wang WQ, Zeng W (2003) A robust text detection algorithm in images and video frames. In: Proceedings of the 2003 Joint Conference of the 4th International Conference on Information, Communications and Signal Processing and 4th Pacific-Rim Conference on Multimedia, vol 2, pp 802–806

  33. Yin XC, Yin YW, Huang KZ, Hao HW (2014) Robust Text Detection in Natural Scene Images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  34. Yu C, Song Y H, Zhang Y L, Liu Y (2016) Scene text localization using edge analysis and feature pool. Neurocomputing 175:625–661

    Google Scholar 

  35. Yuan J, Wei B G, Liu Y H, Zhang Y, Wang L D (2015) A method for text line detection in natural images. Multimedia Tools and Applications 74(3):859–884

    Article  Google Scholar 

  36. Zitnick CL, Dollar P (2014) Edge Boxes: Locating Object Proposals from Edges. In: Proceedings of the 13th European Conference on Computer Vision part V Lecture Notes in Computer Science, pp 391–405

  37. Zhu KH, Qi FH, Jiang RJ, Xu L (2007) Automatic character detection and segmentation in natural scene images. J Zheijang Univ Sci A 8(1):63–71

    Article  Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (91520301).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yonghong Song.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, X., Song, Y., Zhang, Y. et al. A hierarchical recursive method for text detection in natural scene images. Multimed Tools Appl 76, 26201–26223 (2017). https://doi.org/10.1007/s11042-016-4099-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-4099-2

Keywords

Navigation