Abstract
Segmentation-based methods have been widely adopted in scene text detection recently, for they could more accurately predict the shape of various scene text at pixel-level than other methods. However, complicated feature aggregation or label assignment algorithms used in current segmentation-based methods would significantly decrease the detection speed during the improving of accuracy. In this paper, we present an Adaptive Double Pyramid Network (ADPNet) for real-time detection of arbitrary-shaped text, which sets a Double Feature Enhancement Pyramid using Packet Downsampling Units (PDUnits) to enhance feature maps with a minimal amount of processing. The performance of ADPNet is validated on three benchmark datasets, and it shows that ADPNet obtains state-of-the-art performance in both speed and accuracy. Specifically, the proposed network achieves an F-measure of 85.7% while running at 40.5 fps on the ICDAR2015 dataset.
Similar content being viewed by others
References
Yu J, Yao J, Zhang J, Yu Z, Tao D (2019) Single pixel reconstruction for one-stage instance segmentation. arXiv preprint arXiv:1904.07426
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Zhang Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4159–4167
Deng D, Liu H, Li X, Cai D (2018) Pixellink: Detecting scene text via instance segmentation. In: Proceedings of the AAAI conference on artificial intelligence, vol. 32
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proceedings of the European conference on computer vision (ECCV), pp. 20–36
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9336–9345
Liao M, Wan Z, Yao C, Chen K, Bai X (2020) Real-time scene text detection with differentiable binarization. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, 11474–11481
Li J, Lin Y, Liu R, Ho CM, Shi H (2021) Rsca: real-time segmentation-based context-aware scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2349–2358
Tian Z, Shu M, Lyu P, Li R, Zhou C, Shen X, Jia J (2019) Learning shape-aware embedding for scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4234–4243
Zhang J, Cao Y, Wu Q (2021) Vector of locally and adaptively aggregated descriptors for image feature representation. Pattern Recognit. 116:107952
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
Huang J, Jiang Z, Zhang H, Cai B, Yao Y (2017) Region proposal for ship detection based on structured forests edge method. In: 2017 IEEE international geoscience and remote sensing symposium (IGARSS), pp. 1856–1859. IEEE
Wang S, Liu Y, He Z, Wang Y, Tang Z (2020) A quadrilateral scene text detector with two-stage network architecture. Pattern Recognit 102:107230
Xue C, Lu S, Hoi S (2022) Detection and rectification of arbitrary shaped scene texts by using text keypoints and links. Pattern Recognit 124:108494
Deng L, Gong Y, Lin Y, Shuai J, Tu X, Zhang Y, Ma Z, Xie M (2019) Detecting multi-oriented text with corner-based region proposals. Neurocomputing 334:134–142
Li J, Cheng B, Feris R, Xiong J, Huang TS, Hwu W-M, Shi H (2021) Pseudo-iou: Improving label assignment in anchor-free object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 2378–2387
Mafla A, Tito R, Dey S, Gómez L, Rusiñol M, Valveny E, Karatzas D (2021) Real-time lexicon-free scene text retrieval. Pattern Recognit 110:107656
Zhou W, Chen K (2022) A lightweight hand gesture recognition in complex backgrounds. Displays 74:102226
Zhou X, Yao C, Wen H, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5551–5560
Xu X, Zhang Z, Wang Z, Price B, Wang Z, Shi H (2021) Rethinking text segmentation: A novel dataset and a text-specific refinement approach. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12045–12055
Qiao L, Tang S, Cheng Z, Xu Y, Niu Y, Pu S, Wu F (2020) Text perceptron: Towards end-to-end arbitrary-shaped text spotting. In: Proceedings of the AAAI conference on artificial intelligence, vol. 34, pp. 11899–11907
Cao M, Zou Y (2020) All you need is a second look: Towards tighter arbitrary shape text detection. In: ICASSP 2020-2020 IEEE International conference on acoustics, speech and signal processing (ICASSP), pp. 2228–2232. IEEE
Wang W, Xie E, Song X, Zang Y, Wang W, Lu T, Yu G, Shen C (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 8440–8449
Wang Y, Xie H, Zha Z-J, Xing M, Fu Z, Zhang Y (2020) Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 11753–11762
Liang M, Hou J-B, Zhu X, Yang C, Qin J, Yin X-C (2021) Multi-orientation scene text detection with scale-guided regression. Neurocomputing 461:310–318
Gupta A, Vedaldi A, Zisserman A (2016) Synthetic data for text localisation in natural images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2315–2324
Yao C, Bai X, Liu W, Ma Y, Tu Z (2012) Detecting texts of arbitrary orientations in natural images. In: 2012 IEEE conference on computer vision and pattern recognition, pp. 1083–1090. IEEE
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Transact Image Process 23(11):4737–4749
Liu Y, Jin L, Zhang S, Luo C, Zhang S (2019) Curved scene text detection via transverse and longitudinal sequence connection. Pattern Recognit 90:337–345
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7132–7141
Liao M, Zhu Z, Shi B, Xia G-s, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp. 5909–5918
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant No. 61404083) and State Key Laboratory of ASIC & System (2021KF010).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, W., Song, W. Real-Time Accurate Text Detection with Adaptive Double Pyramid Network. Neural Process Lett 55, 5055–5067 (2023). https://doi.org/10.1007/s11063-022-11080-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-022-11080-5