Text kernel calculation for arbitrary shape text detection

Han, Xu; Gao, Junyu; Yuan, Yuan; Wang, Qi

doi:10.1007/s00371-023-02963-2

Text kernel calculation for arbitrary shape text detection

Original article
Published: 30 June 2023

Volume 40, pages 2641–2654, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xu Han^1,2,
Junyu Gao²,
Yuan Yuan² &
…
Qi Wang²

201 Accesses
1 Citation
Explore all metrics

Abstract

With the speedy progress of deep learning, text detection has received progressively increasing attention and considerable progress. The current mainstream approaches are usually based on instance segmentation to obtain the label of whether the pixel is text, as this can cope with arbitrary-shaped text. However, pixel-based prediction usually leads to overlapping neighboring texts, resulting in misdetection. To mitigate the above problems, we propose an approach to calculate text kernels and determine the attribution of boundary pixels. This way, all texts are labeled uniformly, facilitating model learning and effectively separating adherent texts. In addition, to cope with the complex and variable background of the text, we propose a practical feature enhancement module to handle it. The proposed module can explore different levels of features to represent text information of diverse sizes. Compared with current advanced algorithms, our method is competitive, which achieves the F1-measure of 87.3, 88.0, 82.8, 85.7, and 90.0\(\%\) on the ICDAR2015, MSRA-TD500, CTW1500, Total-Text, and ICDAR2013 datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

References

Liu, H., Yuan, M., Wang, T., Ren, P., Yan, D.-M.: List: low illumination scene text detector with automatic feature enhancement. Vis. Comput. 38(9), 3231–3242 (2022)
Article Google Scholar
Gao, J., Wang, Q., Yuan, Y.: Convolutional regression network for multi-oriented text detection. IEEE Access 7, 96424–96433 (2019)
Article Google Scholar
Wu, H., Zou, B., Zhao, Y.-Q., Guo, J.: Scene text detection using adaptive color reduction, adjacent character model and hybrid verification strategy. Vis. Comput. 33(1), 113–126 (2017)
Article Google Scholar
Kera, S.B., Tadepalli, A., Ranjani, J.J.: A paced multi-stage block-wise approach for object detection in thermal images. Vis. Comput. 2022, 1–17 (2022)
Google Scholar
Liu, Z.-Y., Liu, J.-W.: Hypergraph attentional convolutional neural network for salient object detection. Vis. Comput. 2022, 1–27 (2022)
Google Scholar
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: Deformable convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 764–773 (2017)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Wang, H., Chen, Y., Wu, M., Zhang, X., Huang, Z., Mao, W.: Attentional and adversarial feature mimic for efficient object detection. Vis. Comput. 2022, 1–12 (2022)
Google Scholar
Xiao, H., Ran, Z., Mabu, S., Li, Y., Li, L.: Saunet++: an automatic segmentation model of COVID-19 lesion from CT slices. Vis. Comput. 2022, 1–14 (2022)
Google Scholar
Li, Y., Wang, Z., Yin, L., Zhu, Z., Qi, G., Liu, Y.: X-net: a dual encoding-decoding method in medical image segmentation. Vis. Comput. 2022, 1–11 (2021)
Google Scholar
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768 (2018)
Wang, Q., Gao, J., Li, X.: Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes. IEEE Trans. Image Process. 28(9), 4376–4386 (2019)
Article MathSciNet Google Scholar
Gao, J., Wang, Q., Li, X.: Pcc net: Perspective crowd counting via spatial convolutional network. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3486–3498 (2019)
Article Google Scholar
Yang, C., Chen, M., Xiong, Z., Yuan, Y., Wang, Q.: Cm-net: concentric mask based arbitrary-shaped text detection. IEEE Trans. Image Process. 31, 2864–2877 (2022)
Article Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Wang, W., Xie, E., Li, X., Hou, W., Lu, T., Yu, G., Shao, S.: Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9336–9345 (2019)
Vatti, B.R.: A generic solution to polygon clipping. Commun. ACM 35(7), 56–63 (1992)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28, 91–99 (2015)
Google Scholar
Liao, M., Shi, B., Bai, X., Wang, X., Liu, W.: Textboxes: a fast text detector with a single deep neural network. Proc. AAAI Conf. Artif. Intell. 31(1), 4161–4167 (2017)
Google Scholar
Liao, M., Shi, B., Bai, X.: Textboxes++: A single-shot oriented scene text detector. IEEE Trans. Image Process. 27(8), 3676–3690 (2018)
Article MathSciNet Google Scholar
Zhou, X., Yao, C., Wen, H., Wang, Y., Zhou, S., He, W., Liang, J.: East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5551–5560 (2017)
Ma, J., Shao, W., Ye, H., Wang, L., Wang, H., Zheng, Y., Xue, X.: Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans. Multimed. 20(11), 3111–3122 (2018)
Article Google Scholar
Baek, Y., Lee, B., Han, D., Yun, S., Lee, H.: Character region awareness for text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9365–9374 (2019)
Zhang, C., Liang, B., Huang, Z., En, M., Han, J., Ding, E., Ding, X.: Look more than once: an accurate detector for text of arbitrary shapes. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 10552–10561 (2019)
Long, S., Ruan, J., Zhang, W., He, X., Wu, W., Yao, C.: Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 20–36 (2018)
Feng, W., He, W., Yin, F., Zhang, X.-Y., Liu, C.-L.: Textdragon: an end-to-end framework for arbitrary shaped text spotting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9076–9085 (2019)
Xu, Y., Wang, Y., Zhou, W., Wang, Y., Yang, Z., Bai, X.: Textfield: learning a deep direction field for irregular scene text detection. IEEE Trans. Image Process. 28(11), 5566–5579 (2019)
Article MathSciNet Google Scholar
Zhu, Y., Chen, J., Liang, L., Kuang, Z., Jin, L., Zhang, W.: Fourier contour embedding for arbitrary-shaped text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3123–3131 (2021)
Deng, D., Liu, H., Li, X., Cai, D.: Pixellink: detecting scene text via instance segmentation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32, pp. 6773–6780 (2018)
He, M., Liao, M., Yang, Z., Zhong, H., Tang, J., Cheng, W., Yao, C., Wang, Y., Bai, X.: Most: a multi-oriented scene text detector with localization refinement. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8813–8822 (2021)
Tian, Z., Shu, M., Lyu, P., Li, R., Zhou, C., Shen, X., Jia, J.: Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4234–4243 (2019)
Zhang, S.-X., Zhu, X., Yang, C., Wang, H., Yin, X.-C.: Adaptive boundary proposal network for arbitrary shape text detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1305–1314 (2021)
Liao, M., Wan, Z., Yao, C., Chen, K., Bai, X.: Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 11474–11481 (2020)
Wang, W., Xie, E., Song, X., Zang, Y., Wang, W., Lu, T., Yu, G., Shen, C.: Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 8440–8449 (2019)
Xu, X., Zhang, Z., Wang, Z., Price, B., Wang, Z., Shi, H.: Rethinking text segmentation: a novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 12045–12055 (2021)
Cai, Y., Liu, C., Cheng, P., Du, D., Zhang, L., Wang, W., Ye, Q.: Scale-residual learning network for scene text detection. IEEE Trans. Circuits Syst. Video Technol. 31(7), 2725–2738 (2020)
Article Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Shrivastava, A., Gupta, A., Girshick, R.: Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769 (2016)
Karatzas, D., Gomez-Bigorda, L., Nicolaou, A., Ghosh, S., Bagdanov, A., Iwamura, M., Matas, J., Neumann, L., Chandrasekhar, V.R., Lu, S., et al.: Icdar 2015 competition on robust reading. In: 2015 13th International Conference on Document Analysis and Recognition (ICDAR), pp. 1156–1160. IEEE (2015)
Yao, C., Bai, X., Liu, W., Ma, Y., Tu, Z.: Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1083–1090. IEEE (2012)
Karatzas, D., Shafait, F., Uchida, S., Iwamura, M., i Bigorda, L.G., Mestre, S.R., Mas, J., Mota, D.F., Almazan, J.A., De Las Heras, L.P.: Icdar 2013 robust reading competition. In: 2013 12th International Conference on Document Analysis and Recognition, pp. 1484–1493. IEEE (2013)
Ch’ng, C.K., Chan, C.S.: Total-text: A comprehensive dataset for scene text detection and recognition. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 935–942. IEEE (2017)
Yuliang, L., Lianwen, J., Shuaitao, Z., Sheng, Z.: Detecting curve text in the wild: new dataset and new solution. arXiv preprint arXiv:1712.02170 (2017)
Nayef, N., Yin, F., Bizid, I., Choi, H., Feng, Y., Karatzas, D., Luo, Z., Pal, U., Rigaud, C., Chazalon, J., et al.: Icdar2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), vol. 1, pp. 1454–1459. IEEE (2017)
Shi, B., Bai, X., Belongie, S.: Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2550–2558 (2017)
Wang, H., Lu, P., Zhang, H., Yang, M., Bai, X., Xu, Y., He, M., Wang, Y., Liu, W.: All you need is boundary: toward arbitrary-shaped text spotting. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12160–12167 (2020)
Liu, Z., Lin, G., Yang, S., Feng, J., Lin, W., Goh, W.L.: Learning markov clustering networks for scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018)
He, P., Huang, W., He, T., Zhu, Q., Qiao, Y., Li, X.: Single shot text detector with regional attention. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3047–3055 (2017)
Liu, Y., Jin, L., Fang, C.: Arbitrarily shaped scene text detection with a mask tightness text detector. IEEE Trans. Image Process. 29, 2918–2930 (2020)
Article Google Scholar
Dai, P., Li, Y., Zhang, H., Li, J., Cao, X.: Accurate scene text detection via scale-aware data augmentation and shape similarity constraint. IEEE Trans. Multimed. 24, 1883–1895 (2021)
Article Google Scholar
Dai, P., Li, Y., Zhang, H., Li, J., Cao, X.: Accurate scene text detection via scale-aware data augmentation and shape similarity constraint. IEEE Trans. Multimed. 24, 1883–1895 (2022)
Article Google Scholar
Tang, J., Yang, Z., Wang, Y., Zheng, Q., Xu, Y., Bai, X.: Seglink++: detecting dense and arbitrary-shaped scene text by instance-aware component grouping. Pattern Recognit. 96, 106954 (2019)
Article Google Scholar
Wang, Y., Xie, H., Fu, Z., Zhang, Y.: Dsrn: a deep scale relationship network for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 947–953 (2019)
Zhang, S.-X., Zhu, X., Hou, J.-B., Liu, C., Yang, C., Wang, H., Yin, X.-C.: Deep relational reasoning graph network for arbitrary shape text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9699–9708 (2020)
Wang, Y., Xie, H., Zha, Z.-J., Xing, M., Fu, Z., Zhang, Y.: Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 11753–11762 (2020)
Yang, C., Chen, M., Xiong, Z., Yuan, Y., Wang, Q.: Cm-net: concentric mask based arbitrary-shaped text detection. IEEE Trans. Image Process. 31, 2864–2877 (2022)
Article Google Scholar
Liao, M., Zou, Z., Wan, Z., Yao, C., Bai, X.: Real-time scene text detection with differentiable binarization and adaptive scale fusion. IEEE Trans. Pattern Anal. Mach. Intell. 45, 919–931 (2022)
Article Google Scholar
Yang, C., Chen, M., Yuan, Y., Wang, Q.: Reinforcement shrink-mask for text detection. IEEE Trans. Multimed. 2022, 1–13 (2022)
Google Scholar
Zhang, S.-X., Zhu, X., Hou, J.-B., Yang, C., Yin, X.-C.: Kernel proposal network for arbitrary shape text detection. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–12 (2022)
Google Scholar
Su, Y., Shao, Z., Zhou, Y., Meng, F., Zhu, H., Liu, B., Yao, R.: Textdct: arbitrary-shaped text detection via discrete cosine transform mask. IEEE Trans. Multimed. 2022, 1–14 (2022)
Google Scholar
Yang, C., Chen, M., Yuan, Y., Wang, Q.: Text growing on leaf. IEEE Trans. Multimed. 2023, 1–14 (2023)
Google Scholar
Dai, P., Zhang, S., Zhang, H., Cao, X.: Progressive contour regression for arbitrary-shape scene text detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7393–7402 (2021)
Xue, C., Lu, S., Zhang, W.: Msr: multi-scale shape regression for scene text detection. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, pp. 989–995 (2019)
Cao, M., Zhang, C., Yang, D., Zou, Y.: All you need is a second look: towards arbitrary-shaped text detection. IEEE Transactions on Circuits and Systems for Video Technology (2021)
Liu, Y., Jin, L., Zhang, S., Luo, C., Zhang, S.: Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognition 90, 337–345 (2019)
Article Google Scholar
Wang, Y., Xie, H., Zha, Z., Tian, Y., Fu, Z., Zhang, Y.: R-net: a relationship network for efficient and accurate scene text detection. IEEE Trans. Multimed. 23, 1316–1329 (2020)
Article Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant Nos. U21B2041, 61825603, National Key R &D Program of China 2020YFB2103902.

Author information

Authors and Affiliations

School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072, Shannxi, China
Xu Han
School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi’an, 710072, Shannxi, China
Xu Han, Junyu Gao, Yuan Yuan & Qi Wang

Authors

Xu Han
View author publications
You can also search for this author in PubMed Google Scholar
Junyu Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Qi Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Han, X., Gao, J., Yuan, Y. et al. Text kernel calculation for arbitrary shape text detection. Vis Comput 40, 2641–2654 (2024). https://doi.org/10.1007/s00371-023-02963-2

Download citation

Accepted: 12 May 2023
Published: 30 June 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00371-023-02963-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text kernel calculation for arbitrary shape text detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text kernel calculation for arbitrary shape text detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

UNet++: A Nested U-Net Architecture for Medical Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation