Skip to main content

Text Detection by Jointly Learning Character and Word Regions

  • Conference paper
  • First Online:
Book cover Document Analysis and Recognition – ICDAR 2021 (ICDAR 2021)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12821))

Included in the following conference series:

Abstract

Text detection in natural scenes has developed significantly in recent years. Segmentation-based methods are widely used for text detection because they are robust to detect text of any shape. However, most state-of-the-art works are limited to word/line level detection as character-level data annotation is too expensive. Considering the close connection between characters and words, we propose a detector containing four different headers: Gaussian map, offset map, mask map, and centerline map, to obtain word-level and character-level prediction results simultaneously. Besides, we design a weakly supervised method to fully use the word-level labels of the real dataset to generate character-level pseudo-labels for training. We perform rigorous experiments on multiple benchmark datasets. Results demonstrate that our method achieves state-of-the-art results. Specifically, we achieve an F-measure of 85.2 on the dataset CTW1500, which is 1.3% higher than the state-of-the-art methods.

This paper is supported by NSFC (No. 61772330, 61876109), Shanghai Municipal Science and Technology Major Project (2021SHZDZX0102), the Shanghai Key Laboratory of Crime Scene Evidence (No. 2017XCWZK01), and the Interdisciplinary Program of Shanghai Jiao Tong University (No. YG2019QNA09).

H. Lu—Also with the MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

    Google Scholar 

  2. Cheng, Z., Bai, F., Xu, Y., Zheng, G., Pu, S., Zhou, S.: Focusing attention: towards accurate text recognition in natural images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5076–5084 (2017)

    Google Scholar 

  3. Ch’ng, C.K., Chan, C.S.: Total-text: a comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

    Google Scholar 

  4. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  5. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2963–2970. IEEE (2010)

    Google Scholar 

  6. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2315–2324 (2016)

    Google Scholar 

  7. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  8. Hu, H., Zhang, C., Luo, Y., Wang, Y., Han, J., Ding, E.: WordSup: exploiting word annotations for character based text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4940–4949 (2017)

    Google Scholar 

  9. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

    Google Scholar 

  10. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  11. Liao, M., Shi, B., Bai, X.: TextBoxes++: a single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  12. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31 (2017)

    Google Scholar 

  13. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: AAAI, pp. 11474–11481 (2020)

    Google Scholar 

  14. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

    Google Scholar 

  15. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

    Google Scholar 

  16. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  17. Liu, Y., Jin, L.: Deep matching prior network: toward tighter multi-oriented text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1962–1969 (2017)

    Google Scholar 

  18. Liu, Y., Jin, L., Fang, C.: Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29, 2918–2930 (2019)

    Article  Google Scholar 

  19. Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recogn. 90, 337–345 (2019)

    Article  Google Scholar 

  20. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: TextSnake: a flexible representation for detecting text of arbitrary shapes. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 19–35. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01216-8_2

    Chapter  Google Scholar 

  21. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comput. 22(10), 761–767 (2004)

    Article  Google Scholar 

  22. Milletari, F., Navab, N., Ahmadi, S.A.: V-Net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 4th International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)

    Google Scholar 

  23. Papandreou, G., et al.: Towards accurate multi-person pose estimation in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4903–4911 (2017)

    Google Scholar 

  24. Qiao, L., et al.: Text perceptron: towards end-to-end arbitrary-shaped text spotting. Proc. AAAI Conf. Artif. Intell. 34, 11899–11907 (2020)

    Google Scholar 

  25. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2016)

    Article  Google Scholar 

  26. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

    Google Scholar 

  27. Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W.K., Woo, W.: Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in Neural Information Processing Systems 28, pp. 802–810 (2015)

    Google Scholar 

  28. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)

    Google Scholar 

  29. Tian, S., Lu, S., Li, C.: WeText: scene text detection under weak supervision. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1492–1500 (2017)

    Google Scholar 

  30. Tian, Z., Huang, W., He, T., He, P., Qiao, Yu.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  31. Tian, Z., et al.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)

    Google Scholar 

  32. Wang, W., et al.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

    Google Scholar 

  33. Wang, W., et al.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)

    Google Scholar 

  34. Wang, Y., Xie, H., Zha, Z.J., Xing, M., Fu, Z., Zhang, Y.: ContourNet: taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11753–11762 (2020)

    Google Scholar 

  35. Wu, Y., Natarajan, P.: Self-organized text detection with minimal post-processing via border learning. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5000–5009 (2017)

    Google Scholar 

  36. Xie, E., Zang, Y., Shao, S., Yu, G., Yao, C., Li, G.: Scene text detection with supervised pyramid context network. Proc. AAAI Conf. Artif. Intell. 33, 9038–9045 (2019)

    Google Scholar 

  37. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: TextField: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)

    Article  MathSciNet  Google Scholar 

  38. Zhang, S.X., et al.: Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9699–9708 (2020)

    Google Scholar 

  39. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

    Google Scholar 

  40. Zhu, Y., Du, J.: TextMountain: accurate scene text detection via instance segmentation. Pattern Recogn. 110, 107336 (2020)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hongtao Lu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, D., Hu, X., Xie, Z., Li, H., Ali, U., Lu, H. (2021). Text Detection by Jointly Learning Character and Word Regions. In: Lladós, J., Lopresti, D., Uchida, S. (eds) Document Analysis and Recognition – ICDAR 2021. ICDAR 2021. Lecture Notes in Computer Science(), vol 12821. Springer, Cham. https://doi.org/10.1007/978-3-030-86549-8_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86549-8_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86548-1

  • Online ISBN: 978-3-030-86549-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics