Abstract
Since convolutional neural networks(CNNs) were applied to scene text detection, the accuracy of text detection has been improved a lot. However, limited by the receptive fields of regular CNNs and due to the large scale variations of texts in images, current text detection methods may fail to detect some texts well when dealing with more challenging text instances, such as arbitrarily shaped texts and extremely small texts. In this paper, we propose a new segmentation based scene text detector, which is equipped with deformable convolution and global channel attention. In order to detect texts of arbitrary shapes, our method replaces traditional convolutions with deformable convolutions, the sampling locations of deformable convolutions are deformed with augmented offsets so that it can better adapt to any shapes of texts, especially curved texts. To get more representative features for texts, an Adaptive Feature Selection module is introduced to better exploit text content through global channel attention. Meanwhile, a scale-aware loss, which adjusts the weights of text instances with different sizes, is formulated to solve the text scale variation problem. Experiments on several standard benchmarks, including ICDAR2015, SCUT-CTW1500, ICDAR2017-MLT and MSRA-TD500 verify the superiority of the proposed method.
Similar content being viewed by others
References
Baek Y, Lee B, Han D, Yun S, Lee H (2019) Character region awareness for text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9357–9366
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 1800–1807
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.89, pp 764–773,
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the thirty-second AAAI conference on artificial intelligence. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16469. AAAI Press, pp 6773–6780
Epshtein B, Ofek E, Wexler Y (2010) Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp 2963–2970
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 630–645
He K, Gkioxari G, Dollár P, Girshick R (2020) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 42(2):386–397
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: 2017 IEEE International conference on computer vision (ICCV), pp 3066–3074
He W, Zhang X, Yin F, Liu C (2017) Deep direct regression for multi-oriented scene text detection. In: 2017 IEEE international conference on computer vision (ICCV). https://doi.org/10.1109/ICCV.2017.87, pp 745–753
Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: Exploiting word annotations for character based text detection. In: 2017 IEEE International conference on computer vision (ICCV), pp 4950–4959
Hu J, Shen L, Albanie S, Sun G, Wu E (2020) Squeeze-and-excitation networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2261–2269
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) Icdar 2015 competition on robust reading. In: 2015 13th international conference on document analysis and recognition (ICDAR), pp 1156–1160
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proceedings of the thirty-first AAAI conference on artificial intelligence, February 4-9, 2017, San Francisco, California, USA. http://aaai.org/ocs/index.php/AAAI/AAAI17/paper/view/14202. AAAI Press, pp 4161–4167
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690. https://doi.org/10.1109/TIP.2018.2825107
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 5909–5918
Liao M, Lyu P, He M, Yao C, Wu W, Bai X (2021) Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. IEEE Trans Pattern Anal Mach Intell 43(2):532–548. https://doi.org/10.1109/TPAMI.2019.2937086
Lin T, Dollaŕ P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C (2016) Ssd: Single shot multibox detector. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 21–37
Liu X, Zhou G, Zhang R, Wei X (2020) An accurate segmentation-based scene text detector with context attention and repulsive text border. In: 2020 IEEE/CVF conference on computer vision and pattern recognition workshops (CVPRW). https://doi.org/10.1109/CVPRW50498.2020.00283, pp 2344–2352
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90(2019):337–345. https://doi.org/10.1016/j.patcog.2019.02.002
Liu Z, Lin G, Yang S, Feng J, Lin W, Goh WL (2018) Learning Markov clustering networks for scene text detection. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 6936–6944
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Computer vision – ECCV 2018. Springer International Publishing, Cham, pp 19–35
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: 2018 IEEE/CVF conference on computer vision and pattern recognition, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimed 20(11):3111–3122
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman M M, Burie J, Liu C, Ogier J (2017) Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification - rrc-mlt. In: 2017 14th IAPR international conference on document analysis and recognition (ICDAR), vol 01, pp 1454–1459
Neumann L, Matas J (2011) A method for text localization and recognition in real-world images. In: Computer vision – ACCV 2010. Springer, Berlin, pp 770–783
Neumann L, Matas J (2012) Real-time scene text localization and recognition. In: 2012 IEEE conference on computer vision and pattern recognition, pp 3538–3545
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149. https://doi.org/10.1109/TPAMI.2016.2577031
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.371. IEEE Computer Society, Los Alamitos, pp 3482–3490
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 761–769
Soni R, Kumar B, Chand S (2019) Text detection and localization in natural scene images based on text awareness score. Appl Intell 49(2019):1376–1405. https://doi.org/10.1007/s10489-018-1338-4
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 2818–2826
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Computer vision – ECCV 2016. Springer International Publishing, Cham, pp 56–72
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.00436, pp 4229–4238
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9328–9337
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: The thirty-third AAAI conference on artificial intelligence, AAAI 2019. https://doi.org/10.1609/aaai.v33i01.33019038. AAAI Press, pp 9038–9045
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2012.6247787, pp 1083–1090
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. Image Processing IEEE Transactions on 23(11):4737–4749
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: 2019 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2019.01080, pp 10544–10553
Zhang Z, Shen W, Yao C, Bai X (2015) Symmetry-based text line detection in natural scenes. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 2558–2567
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4159–4167
Zhong Z, Jin L, Huang S (2017) Deeptext: A new approach for text proposal generation and text detection in natural images. In: 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1208–1212
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: An efficient and accurate scene text detector. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 2642–2651
Acknowledgements
This research is supported by the National Natural Science Foundation of China (No. 61972180).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wu, Q., Luo, W., Chai, Z. et al. Scene text detection by adaptive feature selection with text scale-aware loss. Appl Intell 52, 514–529 (2022). https://doi.org/10.1007/s10489-021-02331-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02331-4