Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

Cheng, Qi; Wang, Guodong; Dong, Qian; Wei, Bin

doi:10.1007/s11042-020-09440-1

Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

Published: 10 August 2020

Volume 79, pages 29225–29242, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qi Cheng¹,
Guodong Wang¹,
Qian Dong² &
…
Bin Wei³

287 Accesses
3 Citations
Explore all metrics

Abstract

Recently, scene text detection has become an active research field, which is an essential component of scene text reading. Especially, segmentation-based methods are commonly used, since the segmentation results can describe text of arbitrary shape. However, curve texts have a diversity of shapes, scales and orientations, which are difficult to locate, so the detector requires to adjust the local receptive fields size adaptively, which can aggregate multi-scale spatial information to accurately locate the curve text instance. Moreover, the low-level features are critical for localizing large text instances. When using Feature Pyramid Network (FPN) for multi-scale feature fusion, it will prevent the flow of accurate localization signals due to the long path from low-level to top-level. In order to solve these two problems, this paper proposes an Adaptive Convolution and Path Enhancement Pyramid Network (ACPEPNet), which can more accurately locate the text instances with arbitrary shapes. Firstly, an Adaptive Convolution Unit is introduced to improve the ability of backbone to aggregate multi-scale spatial information at the same stage. Specially, this unit is a lightweight component and without the cost of computations, based on this component we present a backbone network for text features extraction. Secondly, the original FPN structure is redesigned to build a short path from the low-level to top-level, in this way, we modify the path from one-way flow to two-way flow and add original features to the final stage of information fusion. Experiments on CTW1500, Total-Text, ICDAR 2015 and MSRA-TD500 validate the robustness of the proposed method. When there is no bells and whistles, this method achieves an F-measure of 80.8% without external training data on CTW1500.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Ch’ng CK, Chan CS (2017) Total-text: a comprehensive dataset for scene text detection and recognition. In: Proc. ICDAR, pp 935–942
Chen X, Girshick R, He K, Dollár P (2019) TensorMask: a foundation for dense object segmentation. In: Proc. ICCV, pp 2061–2069
Chen K, Pang J, Wang J, Yu X, Li X, Sun S (2019) Hybrid task cascade for instance segmentation. In: Proc. CVPR, pp 4974–4983
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proc. CVPR, pp 1251–1258
De Boer P-T, Kroese DP, Mannor S, Rubinstein RY (2005) A tutorial on cross-entropy method. Ann Oper Res 134:19–67
Article MathSciNet Google Scholar
Deng D, Liu H, Li X, Deng C (2018) Pixellink: detecting scene text via instance segmentation. In: Proc. AAAI, pp 6773–6780
Gao H, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proc. CVPR, pp 4700–4708
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proc. ICCV, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proc. CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Proc. ECCV, pp 630–645
He W, Zhang X-Y, Yin F, Liu C-L (2017) Deep direct regression for multi-oriented scene text detection. In: Proc. ICCV, pp 745–753
Hu H, Zhang C, Luo Y, Wang Y, Han J, Ding E (2017) Wordsup: exploiting word annotations for character based text detection. In: Proc. ICCV, pp 4940–4949
Hu S, Wang G, Wang Y, Chen C, Pan Z (2020) Accurate image super-resolution using dense connections and dimension reduction network. Multimedia Tools and Application 79:1427–1443
Article Google Scholar
Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Mask scoring r-cnn. In: Proc. CVPR, pp 6409–6418
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: PMLR, vol 37, pp 448–456
Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Matas J, Neumann L, Chandrasekhar VR, Lu S, Shafait F, Uchida S, Valveny E (2015) ICDAR 2015 competition on robust reading. In: Proc. ICDAR, pp 1156–1160
Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proc. CVPR, pp 510–519
Liao M, Shi B, Bai X, Wang X, Liu W (2017) Textboxes: A fast text detector with a single deep neural network. In: Proc AAAI, pp 4161–4167
Liao X, Zheng Q, Ding L (2017) Data embedding in digital images using critical function. Signal Process Image Commun 58:146–156
Article Google Scholar
Liao X, Li K, Yin J (2017) Separable data hiding in encrypted image based on compressive sensing and discrete fourier transform. Multimedia Tools and Application 76:20739–20753
Article Google Scholar
Liao M, Shi B, Bai X (2018) Textboxes++: A single-shot oriented scene text detector. IEEE Trans Image Process 27(8):3676–3690
Article MathSciNet Google Scholar
Liao M, Zhu Z, Shi B, Xia G-s, Bai X (2018) Rotation-sensitive regression for oriented scene text detection. In: Proc. CVPR, pp 5909–5918
Lin T-Y, Doll’ar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proc. CVPR, pp 2117–2125
Liu Y, Jin L, Zhang S, Zhang S (2017) Detecting curve text in the wild: New dataset and new solution. arXiv preprint arXiv:1712.02170
Liu S, Lu Q, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proc. CVPR, pp 8759–8768
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proc. CVPR, pp 3431–3440
Long S, Ruan J, Zhang W, He X, Wu W, Yao C (2018) Textsnake: a flexible representation for detecting text of arbitrary shapes. In: Proc. ECCV, pp 20–36
Lu XK, Ma C, Ni B, Yang X, Reid I, Yang M-H (2018) Deep regression tracking with shrinkage loss. In: Proc. ECCV, pp 353–369
Lu X, Ma C, Ni B, Yang X (2019) Adaptive region proposal with channel regularization for robust object tracking. IEEE Transactions on Circuits and Systems for Video Technology
Lu X, Wang W, Ma C, Shen J, Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proc. CVPR, pp 3623–3632
Lu XK, Wang W, Shen J, Tai Y-W, Crandall D, Hoi SCH (2020) Learning video object segmentation from unlabeled videos. In: Proc. CVPR, pp 8960–8970
Lyu P, Yao C, Wu W, Yan S, Bai X (2018) Multi-oriented scene text detection via corner localization and region segmentation. In: Proc. CVPR, pp 7553–7563
Ma J, Shao W, Ye H, Wang L, Wang H, Zheng Y, Xue X (2018) Arbitrary-oriented scene text detection via rotation proposals. IEEE Trans Multimedia 20(11):3111–3122
Article Google Scholar
Milletari F, Navab N, Ahmadi S-A (2016) V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 3D vision, pp 565–571
Nair V, Hinton GE (2010) Rectified linear units improve restricted boltzmann machines. In: Proc. ICML, vol 807-814
Pan H, Huang W, He T, Zhu Q, Yu Q, Li X (2017) Single shot text detector with regional attention. In: Proc. ICCV, pp 3047–3055
Rezatofighi H, Tsoi M, Gwak JY, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: a metric and a loss for bounding box regression. In: Proc. CVPR, pp 658–666
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proc.CVPR, pp 4510–4520
Shi B, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proc. CVPR, pp 2550–2558
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proc. CVPR, pp 761–769
Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: Proc. ICML, vol 28, pp 1139–1147
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: PMLR, vol 97, pp 6105–6114
Tian Z, Huang W, He T, Pan H, Yu Q (2016) Detecting text in natural image with connectionist text proposal network. In: Proc. ECCV, pp 56–72
Tian M, Chen B, Pang R, Vasudevan V, Sandler M, Howard A, Quoc VL (2019) MnasNet: platform-aware neural architecture search for mobile. In: Proc. CVPR, pp 2820–2828
Wang W, Xie E, Li X, Hou W, Lu T, Yu G, Shao S (2019) Shape robust text detection with progressive scale expansion network. In: Proc. CVPR, pp 9336–9345
Wang Y, Wang G, Chen C, Pan Z (2019) Multi-scale dilated convolution of convolutional neural network for image denoising. Multimedia Tools and Application 78:19945–19960
Article Google Scholar
Wang Y, Hu S, Wang G, Chen C, Pan Z (2020) Multi-scale dilated convolution of convolutional neural network for crowd counting. Multimedia Tools and Application 79:1057–1073
Article Google Scholar
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proc. AAAI, pp 9038–9045
Yao C, Bai X, Liu W, Ma Y, Zhuowen T (2012) Detecting texts of arbitrary orientations in natural images. In: Proc. CVPR, pp 1083–1090
Yao C, Bai X, Liu W (2014) A unified framework for multioriented text detection and recognition. IEEE Trans Image Process 23(11):4737–4749
Article MathSciNet Google Scholar
Yao C, Bai X, Sang N, Zhou X, Zhou S, Cao Z (2016) Scene text detection via holistic, multi-channel prediction. arXiv preprint arXiv:1606.09002
Zheng Z, Zhang C, Shen W, Yao C, Liu W, Bai X (2016) Multi-oriented text detection with fully convolutional networks. In: Proc. CVPR, pp 4159–4167
Zheng Q, Li Z, Zhang Z, Bao Y, Yu G, Peng Y, Sun J (2019) ThunderNet: towards real-time generic object detection. In: Proc. ICCV, pp 6718–6727
Zhou X, Yao C, He W, Wang Y, Zhou S, He W, Liang J (2017) East: an efficient and accurate scene text detector. In: Proc. CVPR, pp 5551–5560

Download references

Acknowledgements

This work is supported by the Natural Science Foundation of Shandong Province (ZR2019MF050), the Shandong Province colleges and universities youth innovation technology plan innovation team project under Grant (No.2020KJN011).

Author information

Authors and Affiliations

College of Computer Science and Technology, Qingdao University, Qingdao, China
Qi Cheng & Guodong Wang
Department of Pediatric Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
Qian Dong
Shandong Key Laboratory of Digital Medicine and Computer Assisted Surgery, The Affiliated Hospital of Qingdao University, Qingdao, China
Bin Wei

Authors

Qi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Guodong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qian Dong
View author publications
You can also search for this author in PubMed Google Scholar
Bin Wei
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guodong Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cheng, Q., Wang, G., Dong, Q. et al. Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network. Multimed Tools Appl 79, 29225–29242 (2020). https://doi.org/10.1007/s11042-020-09440-1

Download citation

Received: 27 December 2019
Revised: 22 July 2020
Accepted: 28 July 2020
Published: 10 August 2020
Issue Date: October 2020
DOI: https://doi.org/10.1007/s11042-020-09440-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Arbitrary-shaped text detection with adaptive convolution and path enhancement pyramid network

Abstract

Access this article

Similar content being viewed by others

SSD: Single Shot MultiBox Detector

CBAM: Convolutional Block Attention Module

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation