Skip to main content
Log in

Scene text detection by adaptive feature selection with text scale-aware loss

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Since convolutional neural networks(CNNs) were applied to scene text detection, the accuracy of text detection has been improved a lot. However, limited by the receptive fields of regular CNNs and due to the large scale variations of texts in images, current text detection methods may fail to detect some texts well when dealing with more challenging text instances, such as arbitrarily shaped texts and extremely small texts. In this paper, we propose a new segmentation based scene text detector, which is equipped with deformable convolution and global channel attention. In order to detect texts of arbitrary shapes, our method replaces traditional convolutions with deformable convolutions, the sampling locations of deformable convolutions are deformed with augmented offsets so that it can better adapt to any shapes of texts, especially curved texts. To get more representative features for texts, an Adaptive Feature Selection module is introduced to better exploit text content through global channel attention. Meanwhile, a scale-aware loss, which adjusts the weights of text instances with different sizes, is formulated to solve the text scale variation problem. Experiments on several standard benchmarks, including ICDAR2015, SCUT-CTW1500, ICDAR2017-MLT and MSRA-TD500 verify the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. https://rrc.cvc.uab.es/?ch=8

References

  1. Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9357–9366

  2. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807

  3. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.89, pp 764–773,

  4. Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the thirty-second AAAI conference on artificial intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16469. AAAI Press, pp 6773–6780

  5. Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2963–2970

  6. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 630–645

  7. He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397

    Article  Google Scholar 

  8. He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: 2017 IEEE International conference on computer vision (ICCV), pp 3066–3074

  9. He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.87, pp 745–753

  10. Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: Exploiting word annotations for character based text detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 4950–4959

  11. Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023

    Article  Google Scholar 

  12. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269

  13. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1156–1160

  14. Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14202. AAAI Press, pp 4161–4167

  15. Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107

    Article  MathSciNet  Google Scholar 

  16. Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5909–5918

  17. Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2021) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43(2):532–548. https://doi.org/10.1109/TPAMI.2019.2937086

    Article  Google Scholar 

  18. Lin T, Dollaŕ P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944

  19. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37

  20. Liu X, Zhou G, Zhang R, Wei X (2020) An accurate segmentation-based scene text detector with context attention and repulsive text border. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/CVPRW50498.2020.00283, pp 2344–2352

  21. Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90(2019):337–345. https://doi.org/10.1016/j.patcog.2019.02.002

    Article  Google Scholar 

  22. Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning Markov clustering networks for scene text detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6936–6944

  23. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  24. Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Computer vision – ECCV 2018. Springer International Publishing, Cham, pp 19–35

  25. Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7553–7563

  26. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122

    Article  Google Scholar 

  27. Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J, Liu C, Ogier J (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 01, pp 1454–1459

  28. Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Computer vision – ACCV 2010. Springer, Berlin, pp 770–783

  29. Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3538–3545

  30. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  31. Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.371. IEEE Computer Society, Los Alamitos, pp 3482–3490

  32. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 761–769

  33. Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(2019):1376–1405. https://doi.org/10.1007/s10489-018-1338-4

    Article  Google Scholar 

  34. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826

  35. Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 56–72

  36. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00436, pp 4229–4238

  37. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9328–9337

  38. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019. https://doi.org/10.1609/aaai.v33i01.33019038. AAAI Press, pp 9038–9045

  39. Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247787, pp 1083–1090

  40. Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. Image Processing IEEE Transactions on 23(11):4737–4749

    Article  MathSciNet  Google Scholar 

  41. Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.01080, pp 10544–10553

  42. Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2558–2567

  43. Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4159–4167

  44. Zhong Z, Jin L, Huang S (2017) Deeptext: A new approach for text proposal generation and text detection in natural images. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1208–1212

  45. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2642–2651

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 61972180).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qin Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Q., Luo, W., Chai, Z. et al. Scene text detection by adaptive feature selection with text scale-aware loss. Appl Intell 52, 514–529 (2022). https://doi.org/10.1007/s10489-021-02331-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02331-4

Keywords

Navigation