Skip to main content

TKDN: Scene Text Detection via Keypoints Detection

  • Conference paper
  • First Online:
  • 2320 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11365))

Abstract

In the past few years, great efforts have been devoted to scene text detection. Nevertheless, efficient text detection in the wild remains a challenging problem. Methods for general object detection usually have limitations in handling the arbitrary orientations and large aspect ratios of scene text. In this paper, we present a novel scene text detection method which treats text detection as a text keypoint detection task performed in a coarse-to-fine scheme (text keypoint detection network, TKDN). Specifically, in TKDN we first generate the coarse text instance regions using feature pyramid network (FPN) as well as region proposal network (RPN) and ResNet50. Within the coarse text regions, we then perform text keypoint detection, bounding box classification and regression, and text region segmentation in a multi-task way. In the inference stage, an effective post-processing algorithm is designed to combine the outputs from three branches and obtain the final text keypoint detection results. The proposed TKDN approach outperforms the state-of-the-art approach and achieves an F-measure of 82.0% on the public-domain ICDAR2015 database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Alp Güler, R., Neverova, N., Kokkinos, I.: DensePose: dense human pose estimation in the wild. In: IEEE CVPR (2018)

    Google Scholar 

  2. Cao, Z., Simon, T., Wei, S., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE CVPR (2017)

    Google Scholar 

  3. Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: IEEE CVPR (2004)

    Google Scholar 

  4. Deng, D., Liu, H., Li, X., Cai, D.: PixelLink: detecting scene text via instance segmentation. In: AAAI (2018)

    Google Scholar 

  5. Epshtein, B., Ofek, E., Wexler, Y.: Detecting text in natural scenes with stroke width transform. In: IEEE CVPR (2010)

    Google Scholar 

  6. Gupta, A., Vedaldi, A., Zisserman, A.: Synthetic data for text localisation in natural images. In: IEEE CVPR (2016)

    Google Scholar 

  7. Han, H., Jain, A.K.: 3D face texture modeling from uncalibrated frontal and profile images. In: IEEE BTAS (2012)

    Google Scholar 

  8. Han, H., Jain, A.K., Wang, F., Shan, S., Chen, X.: Heterogeneous face attribute estimation: a deep multi-task learning approach. IEEE Trans. PAMI 40(11), 2597–2609 (2018)

    Article  Google Scholar 

  9. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. In: IEEE ICCV (2017)

    Google Scholar 

  10. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE CVPR (2016)

    Google Scholar 

  11. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: IEEE ICCV (2017)

    Google Scholar 

  12. He, W., Zhang, X.Y., Yin, F., Liu, C.L.: Deep direct regression for multi-oriented scene text detection. In: IEEE ICCV (2017)

    Google Scholar 

  13. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  14. Karatzas, D., et al.: ICDAR 2015 competition on robust reading. In: ICDAR (2015)

    Google Scholar 

  15. Kim, K.I., Jung, K., Kim, J.H.: Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm. IEEE Trans. PAMI 25(12), 1631–1639 (2003)

    Article  Google Scholar 

  16. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: TextBoxes: a fast text detector with a single deep neural network. In: AAAI (2017)

    Google Scholar 

  17. Liao, M., Zhu, Z., Shi, B., Xia, G.S., Bai, X.: Rotation-sensitive regression for oriented scene text detection. In: IEEE CVPR (2018)

    Google Scholar 

  18. Lin, T., Dollár, P., Girshick, R.B., He, K., Hariharan, B., Belongie, S.J.: Feature pyramid networks for object detection. In: IEEE CVPR (2017)

    Google Scholar 

  19. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  20. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: IEEE CVPR (2015)

    Google Scholar 

  21. Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE ICCV (1999)

    Google Scholar 

  22. Lyu, P., Yao, C., Wu, W., Yan, S., Bai, X.: Multi-oriented scene text detection via corner localization and region segmentation. In: IEEE CVPR (2018)

    Google Scholar 

  23. Ma, J., et al.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimedia 20, 3111–3122 (2018)

    Article  Google Scholar 

  24. Neumann, L., Matas, J.: A method for text localization and recognition in real-world images. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6494, pp. 770–783. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19318-7_60

    Chapter  Google Scholar 

  25. Nie, X., Feng, J., Zuo, Y., Yan, S.: Human pose estimation with parsing induced learner. In: IEEE CVPR (2018)

    Google Scholar 

  26. Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  27. Shi, B., Bai, X., Belongie, S.J.: Detecting oriented text in natural images by linking segments. In: IEEE CVPR (2017)

    Google Scholar 

  28. Shi, B., Bai, X., Yao, C.: An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Trans. PAMI 39(11), 2298–2304 (2017)

    Article  Google Scholar 

  29. Song, Y., Cui, Y., Han, H., Shan, S., Chen, X.: Scene text detection via deep semantic feature fusion and attention-based refinement. In: ICPR (2018)

    Google Scholar 

  30. Sun, L., Huo, Q., Jia, W., Chen, K.: A robust approach for text detection from natural scene images. Pattern Recognit. 48(9), 2906–2920 (2015)

    Article  Google Scholar 

  31. Tian, Z., Huang, W., He, T., He, P., Qiao, Y.: Detecting text in natural image with connectionist text proposal network. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 56–72. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_4

    Chapter  Google Scholar 

  32. Xia, F., Wang, P., Chen, X., Yuille, A.L.: Joint multi-person pose estimation and semantic part segmentation. In: IEEE CVPR (2017)

    Google Scholar 

  33. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: IEEE CVPR (2012)

    Google Scholar 

  34. Yao, C., Bai, X., Sang, N., Zhou, X., Zhou, S., Cao, Z.: Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002 (2016)

  35. Yin, X., Yin, X., Huang, K., Hao, H.: Robust text detection in natural scene images. IEEE Trans. PAMI 36(5), 970–983 (2014)

    Article  Google Scholar 

  36. Yin, X., Yin, X., Hao, H., Iqbal, K.: Effective text localization in natural scene images with MSER, geometry-based grouping and AdaBoost. In: ICPR (2012)

    Google Scholar 

  37. Zhang, Z., Zhang, C., Shen, W., Yao, C., Liu, W., Bai, X.: Multi-oriented text detection with fully convolutional networks. In: IEEE CVPR (2016)

    Google Scholar 

  38. Zhou, X., et al.: EAST: an efficient and accurate scene text detector. In: IEEE CVPR (2017)

    Google Scholar 

Download references

Acknowledgement

This research was supported in part by the Natural Science Foundation of China (grants 61732004, 61390511, and 61672496), External Cooperation Program of Chinese Academy of Sciences (CAS) (grant GJHZ1843), and Youth Innovation Promotion Association CAS (2018135).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hu Han .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, Y., Li, J., Han, H., Shan, S., Chen, X. (2019). TKDN: Scene Text Detection via Keypoints Detection. In: Jawahar, C., Li, H., Mori, G., Schindler, K. (eds) Computer Vision – ACCV 2018. ACCV 2018. Lecture Notes in Computer Science(), vol 11365. Springer, Cham. https://doi.org/10.1007/978-3-030-20873-8_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-20873-8_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-20872-1

  • Online ISBN: 978-3-030-20873-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics