Skip to main content
Log in

Text kernel calculation for arbitrary shape text detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

With the speedy progress of deep learning, text detection has received progressively increasing attention and considerable progress. The current mainstream approaches are usually based on instance segmentation to obtain the label of whether the pixel is text, as this can cope with arbitrary-shaped text. However, pixel-based prediction usually leads to overlapping neighboring texts, resulting in misdetection. To mitigate the above problems, we propose an approach to calculate text kernels and determine the attribution of boundary pixels. This way, all texts are labeled uniformly, facilitating model learning and effectively separating adherent texts. In addition, to cope with the complex and variable background of the text, we propose a practical feature enhancement module to handle it. The proposed module can explore different levels of features to represent text information of diverse sizes. Compared with current advanced algorithms, our method is competitive, which achieves the F1-measure of 87.3, 88.0, 82.8, 85.7, and 90.0\(\%\) on the ICDAR2015, MSRA-TD500, CTW1500, Total-Text, and ICDAR2013 datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Liu, H., Yuan, M., Wang, T., Ren, P., Yan, D.-M.: List: low illumination scene text detector with automatic feature enhancement. Vis. Comput. 38(9), 3231–3242 (2022)

    Article  Google Scholar 

  2. Gao, J., Wang, Q., Yuan, Y.: Convolutional regression network for multi-oriented text detection. IEEE Access 7, 96424–96433 (2019)

    Article  Google Scholar 

  3. Wu, H., Zou, B., Zhao, Y.-Q., Guo, J.: Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. Vis. Comput. 33(1), 113–126 (2017)

    Article  Google Scholar 

  4. Kera, S.B., Tadepalli, A., Ranjani, J.J.: A paced multi-stage block-wise approach for object detection in thermal images. Vis. Comput. 2022, 1–17 (2022)

    Google Scholar 

  5. Liu, Z.-Y., Liu, J.-W.: Hypergraph attentional convolutional neural network for salient object detection. Vis. Comput. 2022, 1–27 (2022)

    Google Scholar 

  6. Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)

  7. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  8. Wang, H., Chen, Y., Wu, M., Zhang, X., Huang, Z., Mao, W.: Attentional and adversarial feature mimic for efficient object detection. Vis. Comput. 2022, 1–12 (2022)

    Google Scholar 

  9. Xiao, H., Ran, Z., Mabu, S., Li, Y., Li, L.: Saunet++: an automatic segmentation model of COVID-19 lesion from CT slices. Vis. Comput. 2022, 1–14 (2022)

    Google Scholar 

  10. Li, Y., Wang, Z., Yin, L., Zhu, Z., Qi, G., Liu, Y.: X-net: a dual encoding-decoding method in medical image segmentation. Vis. Comput. 2022, 1–11 (2021)

    Google Scholar 

  11. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)

  12. Wang, Q., Gao, J., Li, X.: Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans. Image Process. 28(9), 4376–4386 (2019)

    Article  MathSciNet  Google Scholar 

  13. Gao, J., Wang, Q., Li, X.: Pcc net: Perspective crowd counting via spatial convolutional network. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3486–3498 (2019)

    Article  Google Scholar 

  14. Yang, C., Chen, M., Xiong, Z., Yuan, Y., Wang, Q.: Cm-net: concentric mask based arbitrary-shaped text detection. IEEE Trans. Image Process. 31, 2864–2877 (2022)

    Article  Google Scholar 

  15. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)

  16. Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)

  17. Vatti, B.R.: A generic solution to polygon clipping. Commun. ACM 35(7), 56–63 (1992)

    Article  Google Scholar 

  18. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)

    Google Scholar 

  19. Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. Proc. AAAI Conf. Artif. Intell. 31(1), 4161–4167 (2017)

    Google Scholar 

  20. Liao, M., Shi, B., Bai, X.: Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)

    Article  MathSciNet  Google Scholar 

  21. Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)

  22. Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018)

    Article  Google Scholar 

  23. Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)

  24. Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10552–10561 (2019)

  25. Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)

  26. Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9076–9085 (2019)

  27. Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)

    Article  MathSciNet  Google Scholar 

  28. Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)

  29. Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, pp. 6773–6780 (2018)

  30. He, M., Liao, M., Yang, Z., Zhong, H., Tang, J., Cheng, W., Yao, C., Wang, Y., Bai, X.: Most: a multi-oriented scene text detector with localization refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8813–8822 (2021)

  31. Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)

  32. Zhang, S.-X., Zhu, X., Yang, C., Wang, H., Yin, X.-C.: Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1305–1314 (2021)

  33. Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)

  34. Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)

  35. Xu, X., Zhang, Z., Wang, Z., Price, B., Wang, Z., Shi, H.: Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12045–12055 (2021)

  36. Cai, Y., Liu, C., Cheng, P., Du, D., Zhang, L., Wang, W., Ye, Q.: Scale-residual learning network for scene text detection. IEEE Trans. Circuits Syst. Video Technol. 31(7), 2725–2738 (2020)

    Article  Google Scholar 

  37. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  38. Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)

  39. Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)

  40. Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)

  41. Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)

  42. Ch’ng, C.K., Chan, C.S.: Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)

  43. Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)

  44. Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)

  45. Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)

  46. Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: toward arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12160–12167 (2020)

  47. Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)

  48. He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)

  49. Liu, Y., Jin, L., Fang, C.: Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29, 2918–2930 (2020)

    Article  Google Scholar 

  50. Dai, P., Li, Y., Zhang, H., Li, J., Cao, X.: Accurate scene text detection via scale-aware data augmentation and shape similarity constraint. IEEE Trans. Multimed. 24, 1883–1895 (2021)

    Article  Google Scholar 

  51. Dai, P., Li, Y., Zhang, H., Li, J., Cao, X.: Accurate scene text detection via scale-aware data augmentation and shape similarity constraint. IEEE Trans. Multimed. 24, 1883–1895 (2022)

    Article  Google Scholar 

  52. Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognit. 96, 106954 (2019)

    Article  Google Scholar 

  53. Wang, Y., Xie, H., Fu, Z., Zhang, Y.: Dsrn: a deep scale relationship network for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 947–953 (2019)

  54. Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9699–9708 (2020)

  55. Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11753–11762 (2020)

  56. Yang, C., Chen, M., Xiong, Z., Yuan, Y., Wang, Q.: Cm-net: concentric mask based arbitrary-shaped text detection. IEEE Trans. Image Process. 31, 2864–2877 (2022)

    Article  Google Scholar 

  57. Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45, 919–931 (2022)

    Article  Google Scholar 

  58. Yang, C., Chen, M., Yuan, Y., Wang, Q.: Reinforcement shrink-mask for text detection. IEEE Trans. Multimed. 2022, 1–13 (2022)

    Google Scholar 

  59. Zhang, S.-X., Zhu, X., Hou, J.-B., Yang, C., Yin, X.-C.: Kernel proposal network for arbitrary shape text detection. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12 (2022)

    Google Scholar 

  60. Su, Y., Shao, Z., Zhou, Y., Meng, F., Zhu, H., Liu, B., Yao, R.: Textdct: arbitrary-shaped text detection via discrete cosine transform mask. IEEE Trans. Multimed. 2022, 1–14 (2022)

    Google Scholar 

  61. Yang, C., Chen, M., Yuan, Y., Wang, Q.: Text growing on leaf. IEEE Trans. Multimed. 2023, 1–14 (2023)

    Google Scholar 

  62. Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7393–7402 (2021)

  63. Xue, C., Lu, S., Zhang, W.: Msr: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 989–995 (2019)

  64. Cao, M., Zhang, C., Yang, D., Zou, Y.: All you need is a second look: towards arbitrary-shaped text detection. IEEE Transactions on Circuits and Systems for Video Technology (2021)

  65. Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019)

    Article  Google Scholar 

  66. Wang, Y., Xie, H., Zha, Z., Tian, Y., Fu, Z., Zhang, Y.: R-net: a relationship network for efficient and accurate scene text detection. IEEE Trans. Multimed. 23, 1316–1329 (2020)

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. U21B2041, 61825603, National Key R &D Program of China 2020YFB2103902.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, X., Gao, J., Yuan, Y. et al. Text kernel calculation for arbitrary shape text detection. Vis Comput 40, 2641–2654 (2024). https://doi.org/10.1007/s00371-023-02963-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-023-02963-2

Keywords

Navigation