Skip to main content
Log in

Label distribution learning for scene text detection

  • Research Article
  • Published:
Frontiers of Computer Science Aims and scope Submit manuscript

Abstract

Recently, segmentation-based scene text detection has drawn a wide research interest due to its flexibility in describing scene text instance of arbitrary shapes such as curved texts. However, existing methods usually need complex post-processing stages to process ambiguous labels, i.e., the labels of the pixels near the text boundary, which may belong to the text or background. In this paper, we present a framework for segmentation-based scene text detection by learning from ambiguous labels. We use the label distribution learning method to process the label ambiguity of text annotation, which achieves a good performance without using additional post-processing stage. Experiments on benchmark datasets demonstrate that our method produces better results than state-of-the-art methods for segmentation-based scene text detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Zhu A, Uchida S. Scene word recognition from pieces to whole. Frontiers of Computer Science, 2019, 13(2): 292–301

    Article  Google Scholar 

  2. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 770–778

  3. Long J, Shelhamer E, Darrell T. Fully convolutional networks for semantic segmentation. In: Proceedings of 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015, 3431–3440

  4. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C Y, Berg A C. SSD: single shot multibox detector. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 21–37

  5. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149

    Article  Google Scholar 

  6. Jiang H, Cheng M M, Li S J, Borji A, Wang J. Joint salient object detection and existence prediction. Frontiers of Computer Science, 2019, 13(4): 778–788

    Article  Google Scholar 

  7. Li M, Mao J, Qi X, Jin C. A framework for cloned vehicle detection. Frontiers of Computer Science, 2020, 14(5): 145609

    Article  Google Scholar 

  8. Tian Z, Huang W, He T, He P, Qiao Y. Detecting text in natural image with connectionist text proposal network. In: Proceedings of the 14th European Conference on Computer Vision. 2016, 56–72

  9. Liao M, Shi B, Bai X, Wang X, Liu W. TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 4161–4167

  10. Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J. EAST: an efficient and accurate scene text detector. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 2642–2651

  11. Deng D, Liu H, Li X, Cai D. PixelLink: detecting scene text via instance segmentation. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 2018, 6773–6780

  12. Long S, Ruan J, Zhang W, He X, Wu W, Yao C. TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 19–35

  13. Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S. Shape robust text detection with progressive scale expansion network. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9328–9337

  14. Liao M, Wan Z, Yao C, Chen K, Bai X. Real-time scene text detection with differentiable binarization. In: Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020, 11474–11481

  15. Shi B, Bai X, Belongie S. Detecting oriented text in natural images by linking segments. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3482–3490

  16. Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z. R2CNN: rotational region CNN for orientation robust scene text detection. 2017, arXiv preprint arXiv: 1706.09579

  17. Gao B B, Xing C, Xie C W, Wu J, Geng X. Deep label distribution learning with label ambiguity. IEEE Transactions on Image Processing, 2017, 26(6): 2825–2838

    Article  MATH  Google Scholar 

  18. Geng X, Yin C, Zhou Z H. Facial age estimation by learning from label distributions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(10): 2401–2412

    Article  Google Scholar 

  19. Liao M, Shi B, Bai X. Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing, 2018, 27(8): 3676–3690

    Article  MATH  Google Scholar 

  20. Liu Y, Jin L. Deep matching prior network: Toward tighter multi-oriented text detection. In: Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017, 3454–3461

  21. Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z. Scene text detection via holistic, multi-channel prediction. 2016, arXiv preprint arXiv: 1606.09002

  22. Cour T, Sapp B, Jordan C, Taskar B. Learning from ambiguously labeled images. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 919–926

  23. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y. Deformable convolutional networks. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 764–773

  24. Zhu X, Hu H, Lin S, Dai J. Deformable convNets V2: more deformable, better results. In: Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9300–9308

  25. Gupta A, Vedaldi A, Zisserman A. Synthetic data for text localisation in natural images. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016, 2315–2324

  26. Ch’ng C K, Chan C S. Total-text: a comprehensive dataset for scene text detection and recognition. In: Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition. 2017, 935–942

  27. Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar V R, Lu S, Shafait F, Uchida S, Valveny E. ICDAR 2015 competition on robust reading. In: Proceedings of the 13th International Conference on Document Analysis and Recognition. 2015, 1156–1160

  28. Yao C, Bai X, Liu W, Ma Y, Tu Z. Detecting texts of arbitrary orientations in natural images. In: Proceedings of 2012 IEEE Conference on Computer Vision and Pattern Recognition. 2012, 1083–1090

  29. Yao C, Bai X, Liu W. A unified framework for multioriented text detection and recognition. IEEE Transactions on Image Processing, 2014, 23(11): 4737–4749

    Article  MATH  Google Scholar 

  30. Liu Y, Jin L, Zhang S, Luo C, Zhang S. Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition, 2019, 90: 337–345

    Article  Google Scholar 

  31. Deng J, Dong W, Socher R, Li L J, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. In: Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition. 2009, 248–255

  32. Wang X, Jiang Y, Luo Z, Liu C L, Choi H, Kim S. Arbitrary shape scene text detection with adaptive text region representation. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 6442–6451

  33. Lyu P, Liao M, Yao C, Wu W, Bai X. Mask textSpotter: an end-to-end trainable neural network for spotting text with arbitrary shapes. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 71–88

  34. Xu Y, Wang Y, Zhou W, Wang Y, Yang Z, Bai X. TextField: learning a deep direction field for irregular scene text detection. IEEE Transactions on Image Processing, 2019, 28(11): 5566–5579

    Article  MATH  Google Scholar 

  35. Zhang C, Liang B, Huang Z, En M, Han J, Ding E, Ding X. Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 10544–10553

  36. Baek Y, Lee B, Han D, Yun S, Lee H. Character region awareness for text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 9357–9366

  37. Liu Z, Lin G, Yang S, Liu F, Lin W, Goh W L. Towards robust curve text detection with conditional spatial expansion. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 7261–7270

  38. Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J. Learning shape-aware embedding for scene text detection. In: Proceedings of 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019, 4229–4238

  39. He P, Huang W, He T, Zhu Q, Qiao Y, Li X. Single shot text detector with regional attention. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 3066–3074

  40. Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E. WordSup: exploiting word annotations for character based text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 4950–4959

  41. Lyu P, Yao C, Wu W, Yan S, Bai X. Multi-oriented scene text detection via corner localization and region segmentation. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 7553–7563

  42. Liao M, Zhu Z, Shi B, Xia G S, Bai X. Rotation-sensitive regression for oriented scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 5909–5918

  43. Liu Z, Lin G, Yang S, Feng J, Lin W, Goh W L. Learning Markov clustering networks for scene text detection. In: Proceedings of 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018, 6936–6944

  44. He T, Huang W, Qiao Y, Yao J. Text-attentional convolutional neural network for scene text detection. IEEE Transactions on Image Processing, 2016, 25(6): 2529–2541

    Article  MATH  Google Scholar 

  45. He W, Zhang X Y, Yin F, Liu C L. Deep direct regression for multi-oriented scene text detection. In: Proceedings of 2017 IEEE International Conference on Computer Vision. 2017, 745–753

  46. Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, 2018, 20(11): 3111–3122

    Article  Google Scholar 

  47. Xue C, Lu S, Zhan F. Accurate scene text detection through border semantics awareness and bootstrapping. In: Proceedings of the 15th European Conference on Computer Vision. 2018, 370–387

  48. Xue C, Lu S, Zhang W. MSR: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence. 2019, 989–995

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2018AAA0100104, 2018AAA0100100), the National Natural Science Foundation of China (Grant No. 61702095), and the Natural Science Foundation of Jiangsu Province (BK20211164).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Additional information

Haoyu Ma is currently a master candidate in computer science from Southeast University, China. He received his BS degree from Capital Normal University, China in 2018. His research interests include machine learning, pattern recognition and scene text detection.

Ningning Lu received the BSc (2010) degree in mechanical design and automation from Hefei University of Technology, China and MSc (2021) degree in computer science from Southeast University, China. His research interests include machine learning, pattern recognition, computer vision, and cyber security.

Junjun Mei is the ZTE’s chief R&D engineer in the field of audio and video, engaged in the research of the overall architecture of the integrated video cloud network and key technologies such as computer vision, audio and video coding, and audio and video transmission, and presided over the R&D and design of a number of system solutions.

Tao Guan is the senior system architecter of ZTE, China, mainly engaged in the architecture design and algorithm research of video systems and industrial digital systems, participated in standard organizations, initiated and compiled the formulation of a number of communication standards, and applied for more than 20 national

Yu Zhang is currently an associate Professor with the School of Computer Science and Engineering, Southeast University, China. He received his BS and MS degrees in telecommunications engineering from Xidian University, China in 2001 and 2004, respectively, and PhD degree from Nanyang Technological University, Singapore in 2014. His research areas include computer vision, machine learning, object recognition, video analysis, human action analysis, 3D pose estimation.

Xin Geng received the BS and MS degrees in computer science from Nanjing University, China in 2001 and 2004, respectively, and the PhD degree from Deakin University, Australia in 2008. He joined the School of Computer Science and Engineering at Southeast University, China in 2008, and is currently a professor and vice dean of the school. He has authored over 50 refereed papers, and he holds five patents in these areas. His research interests include pattern recognition, machine learning, and computer vision.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, H., Lu, N., Mei, J. et al. Label distribution learning for scene text detection. Front. Comput. Sci. 17, 176339 (2023). https://doi.org/10.1007/s11704-022-1446-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11704-022-1446-5

Keywords

Navigation