Skip to main content
Log in

Scene text detection with fully convolutional neural networks

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Text detection in scene image has become a hot topic in computer vision and artificial intelligence research, due to its wide range of applications and challenges. Most state-of-the-art methods for text detection based on deep learning rely on text bounding box regression. These methods can not well handle the case that if the scene text is curved. In this paper, we propose a new framework for arbitrarily oriented text detection in natural images based on fully convolutional neural networks. The main idea is to represent a text instance by two forms: text center block and word stroke region. These two elements are detected by two fully convolutional networks, respectively. Final detections are produced by the word region surrounding box algorithm. The proposed method does not need to regress the extant bounding box of the text instance, mainly because the predicted text block region itself implicitly contains position and orientation information. Besides, our method can well handle text in different languages, arbitrary orientations, curved shape and various fonts. To validate the effectiveness of the proposed method, we perform experiments on three public datasets: MSRA-TD500, USTB-SV1K and ICDAR2013, and compare it with other state-of-the-art methods. Experiment results demonstrate that the proposed method achieves competitive results. Based on VGG-16, our method achieves an F-measure of 78.84% on MSRA-TD500, 59.34% on USTB-SV1K, and 88.21% on ICDAR2013.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Bai X, Yang M, Lyu P, Xu Y (2017) Integrating scene text and visual appearance for fine-grained image classification with convolutional neural networks. arXiv:1704.04613

  2. Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: IEEE international conference on computer vision, pp 1206–1214

  3. Cho H, Sung M, Jun B (2016) Canny text detector: fast and robust scene text localization algorithm. In: IEEE conference on computer vision and pattern recognition, pp 3566–3573

  4. Deng D, Liu H, Li X, Cai D (2018) Pixellink: detecting scene text via instance segmentation. arXiv:1801.01315

  5. Eom S, Huh JH (2018) Group signature with restrictive linkability: minimizing privacy exposure in ubiquitous environment. J Ambient Intell Humaniz Comput: 1–11

  6. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: IEEE conference on computer vision and pattern recognition, pp 2963–2970

  7. He D, Yang X, Liang C, Zhou Z, Ororbia AG, Kifer D, Giles CL (2017) Multi-scale fcn with cascaded instance aware segmentation for arbitrary oriented word spotting in the wild. In: IEEE conference on computer vision and pattern recognition, pp 3519–3528

  8. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE international conference on computer vision, pp 3047–3055

  9. He T, Huang W, Qiao Y, Yao J (2016) Accurate text localization in natural image with cascaded convolutional text network. arXiv:1603.09423

  10. Huang L, Yang Y, Deng Y, Yu Y (2015) Densebox: unifying landmark localization with end to end object detection. arXiv:1509.04874

  11. Huang W, Lin Z, Yang J, Wang J (2013) Text localization in natural images using stroke feature transform and text covariance descriptors. In: IEEE international conference on computer vision, pp 1241–1248

  12. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced mser trees. In: European conference on computer vision, pp 497–511

  13. Huh JH, Otgonchimeg S, Seo K (2016) Advanced metering infrastructure design and test bed experiment using intelligent agents: focusing on the plc network base technology for smart grid system. J Supercomput 72(5):1862–1877

    Article  Google Scholar 

  14. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528

  15. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM international conference on multimedia, pp 675–678

  16. Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: rotational region cnn for orientation robust scene text detection. arXiv:1706.09579

  17. Kang L, Li Y, Doermann D (2014) Orientation robust text line detection in natural images. In: IEEE conference on computer vision and pattern recognition, pp 4034–4041

  18. Karaoglu S, Tao R, Gevers T, Smeulders AW (2017) Words matter: scene text for image classification and retrieval. IEEE Trans Multimedia 19(5):1063–1076

    Article  Google Scholar 

  19. Khare V, Shivakumara P, Paramesran R, Blumenstein M (2017) Arbitrarily-oriented multi-lingual text detection in video. Multimed Tools Appl 76 (15):16625–16655

    Article  Google Scholar 

  20. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: a fast text detector with a single deep neural network. In: AAAI conference on artificial intelligence, pp 4161–4167

  21. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: IEEE international conference on computer vision, pp 2980–2988

  22. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37

  23. Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) Fots: fast oriented text spotting with a unified network. arXiv:1801.01671

  24. Liu Y, Jin L (2017) Deep matching prior network: toward tighter multi-oriented text detection. In: IEEE conference on computer vision and pattern recognition, vol 2, p 8

  25. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 3431–3440

  26. Loukhaoukha K, Chouinard JY, Berdai A (2012) A secure image encryption algorithm based on rubik’s cube principle. J Electr Comput Eng 2012:7

    MathSciNet  MATH  Google Scholar 

  27. Matas J, Chum O, Urban M, Pajdla T (2004) Robust wide-baseline stereo from maximally stable extremal regions. Image Vis Comput 22(10):761–767

    Article  Google Scholar 

  28. Nikoloudakis Y, Panagiotakis S, Markakis E, Pallis E, Mastorakis G, Mavromoustakis CX, Dobre C (2016) A fog-based emergency system for smart enhanced living environments. IEEE Cloud Computing 3(6): 54–62

  29. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  30. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: IEEE conference on computer vision and pattern recognition, pp 2550–2558

  31. Shi B, Bai X, Yao C (2017) An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans Pattern Anal Mach Intell 39(11):2298–2304

    Article  Google Scholar 

  32. Tian S, Pan Y, Huang C, Lu S, Yu K, Lim Tan C (2015) Text flow: a unified text detection system in natural scene images. In: IEEE international conference on computer vision, pp 4651–4659

  33. Tian S, Pei WY, Zuo ZY, Yin X (2016) Scene text detection in video by learning locally and globally. In: International joint conference on artificial intelligence, pp 2647–2653

  34. Wolf C, Jolion JM (2006) Object count/area graphs for the evaluation of object detection and segmentation algorithms. Int J Doc Anal Recognit 8(4):280–296

    Article  Google Scholar 

  35. Xie S, Tu Z (2015) Holistically-nested edge detection. In: IEEE international conference on computer vision, pp 1395–1403

  36. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: IEEE conference on computer vision and pattern recognition, pp 1083–1090

  37. Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv:1606.09002

  38. Yi C, Tian Y (2011) Assistive text reading from complex background for blind persons. In: International workshop on camera-based document analysis and recognition, pp 15–28

  39. Yin X, Pei WY, Zhang J, Hao HW (2015) Multi-orientation scene text detection with adaptive clustering. IEEE Trans Pattern Anal Mach Intell 37 (9):1930–1937

    Article  Google Scholar 

  40. Yin X, Yin X, Huang K, Hao HW (2014) Robust text detection in natural scene images. IEEE Trans Pattern Anal Mach Intell 36(5):970–983

    Article  Google Scholar 

  41. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: IEEE conference on computer vision and pattern recognition, pp 2558–2567

  42. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: IEEE conference on computer vision and pattern recognition, pp 4159–4167

  43. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: IEEE conference on computer vision and pattern recognition, pp 2642–2651

Download references

Acknowledgements

This work is supported in part to Zhandong Liu by Natural Science Foundation of China under contract No. 61662082 and No. U1703261, and in part to Dr. Wengang Zhou by Natural Science Foundation of China under contract No. 61632019, the Fundamental Research Funds for the Central Universities, and Young Elite Scientists Sponsorship Program By CAST (No. 2016QNRC001), and in part to Dr. Houqiang Li by Natural Science Foundation of China under contract No. 61836011 and No. 61390514.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Houqiang Li.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Z., Zhou, W. & Li, H. Scene text detection with fully convolutional neural networks. Multimed Tools Appl 78, 18205–18227 (2019). https://doi.org/10.1007/s11042-019-7177-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-7177-4

Keywords

Navigation