Abstract
Natural scene text detection is a challenging task, and the existing quadrilateral bounding box regression-based methods enable the location of horizontal and multi-oriented texts but have great difficulties in locating arbitrary-shaped texts due to the limited shape of the quadrilateral bounding box template. Previous segmentation-based methods, which conduct pixel-level classification and separate adjacent texts by predicting center lines with fixed widths, are able to locate the boundaries of arbitrary-shaped texts. However, the detected text regions may stick together or break into multiple areas with sub-optimal results while the width of the center lines is not appropriate. In this paper, a novel natural scene text detector based on distance map is proposed. The method can detect arbitrary-shaped texts more flexibly and robustly by adjusting the width of the center line. Experimental results on several datasets demonstrate that the proposed method is more competitive than the methods based on fixed-width center lines and obtains state-of-the-art or comparable performance on CTW1500, ICDAR2015 and Total-Text. Notably, the proposed method achieves F-measures of 85.4% on the ICDAR 2015 dataset and 81.6% on the Total-Text dataset. Code is available at: https://github.com/Whu-wxy/DistNet.
Similar content being viewed by others
References
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Singh SP, Markovitch S (eds) Proceedings of the Thirty-First AAAI conference on artificial intelligence, pp 4161–4167
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) EAST: an efficient and accurate scene text detector. In: 2017 IEEE Conference on computer vision and pattern recognition(CVPR), pp 2642–2651
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Lu L, Wu D, Wu T, Huang F, Yi Y (2020) Anchor-free multi-orientation text detection in natural scene images. Appl Intell 50(11):3623–3637
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: A flexible representation for detecting text of arbitrary shapes. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision - ECCV 2018 - 15th european conference, lecture notes in computer science, vol 11206, pp 19–35
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: IEEE Conference on computer vision and pattern recognition(CVPR), pp 9336–9345
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: IEEE Conference on computer vision and pattern recognition(CVPR), pp 4234–4243
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: 2019 IEEE/CVF International conference on computer vision(ICCV), pp 8439–8448
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh SK, Bagdanov AD, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: 13th International conference on document analysis and recognition(ICDAR), pp 1156–1160
Liu Y, Jin L, Zhang S, Zhang S (2017) Detecting curve text in the wild: New dataset and new solution. arxiv:1712.02170
Chng CK, Chan CS (2017) Total-text: A comprehensive dataset for scene text detection and recognition. In: 14th IAPR International conference on document analysis and recognition(ICDAR), pp 935–942
Zhu Y, Du J (2018) Sliding line point regression for shape robust scene text detection. In: 24th International conference on pattern recognition(ICPR), pp 3735–3740
Wang X, Jiang Y, Luo Z, Liu C, Choi H, Kim S (2019) Arbitrary shape scene text detection with adaptive text region representation. In: IEEE Conference on computer vision and pattern recognition(CVPR), pp 6449–6458
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Wang H, Lu P, Zhang H, Yang M, Bai X, Xu Y, He M, Wang Y, Liu W (2020) All you need is boundary: Toward arbitrary-shaped text spotting. In: The thirty-fourth AAAI conference on artificial intelligence(AAAI), pp 12160–12167
Liu Y, Chen H, Shen C, He T, Jin L, Wang L (2020) Abcnet: Real-time scene text spotting with adaptive bezier-curve network. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition(CVPR), pp 9806–9815
Tang J, Yang Z, Wang Y, Zheng Q, Xu Y, Bai X (2019) Seglink++: Detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognit 96:106954
Zhang S, Zhu X, Hou J, Liu C, Yang C, Wang H, Yin X (2020) Deep relational reasoning graph network for arbitrary shape text detection. In: 2020 IEEE/CVF Conference on computer vision and pattern recognition(CVPR), pp 9696–9705
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International conference on learning representations(ICLR)
Feng W, He W, Yin F, Zhang X, Liu C (2019) Textdragon: An end-to-end framework for arbitrary shaped text spotting. In: 2019 IEEE/CVF International conference on computer vision(ICCV), pp 9075–9084
Shelhamer E, Long J, Darrell T (2017) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Ronneberger O, Fischer P, Brox T (2015) U-net: Convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, III WMW, Frangi AF (eds) Medical Image computing and computer-assisted intervention(MICCAI), Lecture Notes in Computer Science, vol 9351, pp 234–241
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations(ICLR)
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Bach FR, Blei DM (eds) Proceedings of the 32nd international conference on machine learning(ICML), JMLR workshop and conference proceedings, vol 37, pp 448–456
Felzenszwalb PF, Huttenlocher DP (2012) Distance transforms of sampled functions. Theory Comput 8(1):415–428
Rosenfeld A, Pfaltz JL (1966) Sequential operations in digital picture processing. J ACM 13(4):471–494
Heckbert PS (1990) A seed fill algorithm. San Diego, pp 275–277
Sudre CH, Li W, Vercauteren T, Ourselin S, Cardoso MJ (2017) Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Cardoso MJ, Arbel T, Carneiro G, Syeda-Mahmood TF, Tavares JMRS, Moradi M, Bradley AP, Greenspan H, Papa JP, Madabhushi A, Nascimento JC, Cardoso JS, Belagiannis V, Lu Z (eds) Deep learning in medical image analysis and multimodal learning for clinical decision support - third international workshop(DLMIA), lecture notes in computer science, vol 10553, pp 240–248
Shrivastava A, Gupta A, Girshick RB (2016) Training region-based object detectors with online hard example mining. In: 2016 IEEE Conference on computer vision and pattern recognition(CVPR), pp 761–769
Nayef N, Yin F, Bizid I, Choi H, Feng Y, Karatzas D, Luo Z, Pal U, Rigaud C, Chazalon J, Khlif W, Luqman MM, Burie J, Liu C, Ogier J (2017) ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification - RRC-MLT. In: 14th IAPR International Conference on document analysis and recognition(ICDAR), pp 1454–1459
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: 8th International conference on learning representations(ICLR)
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: Bengio Y, LeCun Y (eds) 3rd International conference on learning representations(ICLR)
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: 2016 IEEE Conference on computer vision and pattern recognition(CVPR), pp 2315–2324
Zhang MR, Lucas J, Ba J, Hinton GE (2019) Lookahead optimizer: k steps forward, 1 step back. In: Wallach HM, Larochelle H, Beygelzimer A, d’Alché-Buc F, Fox EB, Garnett R (eds) Advances in neural information processing systems 32: annual conference on neural information processing systems, pp 9593–9604
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: 2018 IEEE Conference on computer vision and pattern recognition(CVPR), pp 7553–7563
Liu X, Liang D, Yan S, Chen D, Qiao Y, Yan J (2018) FOTS: Fast oriented text spotting with a unified network. In: 2018 IEEE Conference on computer vision and pattern recognition(CVPR), pp 5676–5685
Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X (2019) Look more than once: An accurate detector for text of arbitrary shapes. In: IEEE Conference on computer vision and pattern recognition(CVPR), pp 10552–10561
Tian Z, Huang W, He T, He P, Qiao Y (2016) Detecting text in natural image with connectionist text proposal network. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016 - 14th European conference, lecture notes in computer science, vol 9912, pp 56–72
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multim 20(11):3111–3122
Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2CNN: rotational region CNN for orientation robust scene text detection. arxiv:1706.09579
Shi B, Bai X, Belongie SJ (2017) Detecting oriented text in natural images by linking segments. In: 2017 IEEE Conference on computer vision and pattern recognition(CVPR), pp 3482–3490
He P, Huang W, He T, Zhu Q, Qiao Y, Li X (2017) Single shot text detector with regional attention. In: IEEE International conference on computer vision(ICCV), pp 3066–3074
Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: Exploiting word annotations for character based text detection. In: IEEE International conference on computer vision(ICCV), pp 4950–4959
Liao M, Zhu Z, Shi B, Xia G, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: 2018 IEEE Conference on computer vision and pattern recognition(CVPR), pp 5909–5918
Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X (2019) Textfield: Learning a deep direction field for irregular scene text detection. IEEE Trans Image Process 28(11):5566–5579
Xue C, Lu S, Zhang W (2019) MSR: multi-scale shape regression for scene text detection. In: Kraus S (ed) Proceedings of the twenty-eighth international joint conference on artificial intelligence(IJCAI), pp 989–995
He T, Tian Z, Huang W, Shen C, Qiao Y, Sun C (2018) An end-to-end textspotter with explicit alignment and attention. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018, pp 5020–5029
Dai Y, Huang Z, Gao Y, Xu Y, Chen K, Guo J, Qiu W (2018) Fused text segmentation networks for multi-oriented scene text detection. In: 24th International conference on pattern recognition(ICPR), pp 3604–3609
Liu Z, Lin G, Yang S, Liu F, Lin W, Goh WL (2019) Towards robust curve text detection with conditional spatial expansion. In: IEEE Conference on computer vision and pattern recognition(CVPR), pp 7269–7278
Zhu Y, Du J (2021) Textmountain: Accurate scene text detection via instance segmentation. Pattern Recognition 110:107336
Acknowledgements
This work was supported by the National Key R&D Program of China [grand number 2021YFB2206200] and the National Science and Technology Major Project [grant number 2017ZX01030102]. The numerical calculations in this paper have been done on the supercomputing system in the Supercomputing Center of Wuhan University.
Author information
Authors and Affiliations
Contributions
XinyuWang: Proposing method conception, designing and carrying out the work, drafting the manuscript. Yaohua Yi: Improving method conception, proposing crucial suggestions and verifying the final submission version. Jibing Peng: Revising the manuscript and processing data. KailiWang: Improving method conception, carrying out the work and revising the manuscript.
Corresponding authors
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, X., Yi, Y., Peng, J. et al. Arbitrary-shaped scene text detection by predicting distance map. Appl Intell 52, 14374–14386 (2022). https://doi.org/10.1007/s10489-021-03065-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-03065-z