Scene text detection by adaptive feature selection with text scale-aware loss

Wu, Qin; Luo, Wenli; Chai, Zhilei; Guo, Guodong

doi:10.1007/s10489-021-02331-4

Scene text detection by adaptive feature selection with text scale-aware loss

Published: 05 May 2021

Volume 52, pages 514–529, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Qin Wu ORCID: orcid.org/0000-0001-8087-3001^1,2,
Wenli Luo¹,
Zhilei Chai^1,2 &
…
Guodong Guo³

563 Accesses
5 Citations
Explore all metrics

Abstract

Since convolutional neural networks(CNNs) were applied to scene text detection, the accuracy of text detection has been improved a lot. However, limited by the receptive fields of regular CNNs and due to the large scale variations of texts in images, current text detection methods may fail to detect some texts well when dealing with more challenging text instances, such as arbitrarily shaped texts and extremely small texts. In this paper, we propose a new segmentation based scene text detector, which is equipped with deformable convolution and global channel attention. In order to detect texts of arbitrary shapes, our method replaces traditional convolutions with deformable convolutions, the sampling locations of deformable convolutions are deformed with augmented offsets so that it can better adapt to any shapes of texts, especially curved texts. To get more representative features for texts, an Adaptive Feature Selection module is introduced to better exploit text content through global channel attention. Meanwhile, a scale-aware loss, which adjusts the weights of text instances with different sizes, is formulated to solve the text scale variation problem. Experiments on several standard benchmarks, including ICDAR2015, SCUT-CTW1500, ICDAR2017-MLT and MSRA-TD500 verify the superiority of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A real-time and effective text detection method for multi-scale and fuzzy text

Article 09 February 2023

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

Notes

https://rrc.cvc.uab.es/?ch=8

References

Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9357–9366
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.89, pp 764–773,
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the thirty-second AAAI conference on artificial intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16469. AAAI Press, pp 6773–6780
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2963–2970
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 630–645
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
Article Google Scholar
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: 2017 IEEE International conference on computer vision (ICCV), pp 3066–3074
He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.87, pp 745–753
Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: Exploiting word annotations for character based text detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 4950–4959
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Article Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1156–1160
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14202. AAAI Press, pp 4161–4167
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107
Article MathSciNet Google Scholar
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5909–5918
Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2021) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43(2):532–548. https://doi.org/10.1109/TPAMI.2019.2937086
Article Google Scholar
Lin T, Dollaŕ P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37
Liu X, Zhou G, Zhang R, Wei X (2020) An accurate segmentation-based scene text detector with context attention and repulsive text border. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/CVPRW50498.2020.00283, pp 2344–2352
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90(2019):337–345. https://doi.org/10.1016/j.patcog.2019.02.002
Article Google Scholar
Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning Markov clustering networks for scene text detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6936–6944
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Computer vision – ECCV 2018. Springer International Publishing, Cham, pp 19–35
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Article Google Scholar
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J, Liu C, Ogier J (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 01, pp 1454–1459
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Computer vision – ACCV 2010. Springer, Berlin, pp 770–783
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3538–3545
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.371. IEEE Computer Society, Los Alamitos, pp 3482–3490
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 761–769
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(2019):1376–1405. https://doi.org/10.1007/s10489-018-1338-4
Article Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 56–72
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00436, pp 4229–4238
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9328–9337
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019. https://doi.org/10.1609/aaai.v33i01.33019038. AAAI Press, pp 9038–9045
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247787, pp 1083–1090
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. Image Processing IEEE Transactions on 23(11):4737–4749
Article MathSciNet Google Scholar
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.01080, pp 10544–10553
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2558–2567
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4159–4167
Zhong Z, Jin L, Huang S (2017) Deeptext: A new approach for text proposal generation and text detection in natural images. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1208–1212
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2642–2651

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No. 61972180).

Author information

Authors and Affiliations

Department of Computer Science, Jiangnan University, Wuxi, 214122, China
Qin Wu, Wenli Luo & Zhilei Chai
Jiangsu Provincial Engineerinig Laboratory of Pattern Recognition and Computational Intelligence, Jiangnan University, Wuxi, 214122, China
Qin Wu & Zhilei Chai
Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, 26505, USA
Guodong Guo

Authors

Qin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Wenli Luo
View author publications
You can also search for this author in PubMed Google Scholar
Zhilei Chai
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qin Wu.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Q., Luo, W., Chai, Z. et al. Scene text detection by adaptive feature selection with text scale-aware loss. Appl Intell 52, 514–529 (2022). https://doi.org/10.1007/s10489-021-02331-4

Download citation

Accepted: 06 March 2021
Published: 05 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02331-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene text detection by adaptive feature selection with text scale-aware loss

Abstract

Access this article

Similar content being viewed by others

A real-time and effective text detection method for multi-scale and fuzzy text

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Scene text detection by adaptive feature selection with text scale-aware loss

Abstract

Access this article

Similar content being viewed by others

A real-time and effective text detection method for multi-scale and fuzzy text

Progressive Scale Expansion Network with Octave Convolution for Arbitrary Shape Scene Text Detection

Enhancing Scene Text Detection via Fused Semantic Segmentation Network with Attention

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation