A spatial feature adaptive network for text detection

Tang, Qingsong; Feng, Xiaoxu; Zhang, Xiangde

doi:10.1007/s11042-022-12619-3

A spatial feature adaptive network for text detection

Published: 28 February 2022

Volume 81, pages 15285–15302, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Qingsong Tang¹,
Xiaoxu Feng¹ &
Xiangde Zhang¹

282 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Due to the capacity of detection arbitrary shapes of text and the robustness in practical applications, scene text detection methods based on segmentation have attached more attention. More accurate segmentation and better feature extraction are the core of segmentation-based detection. In order to refine the result of segmentation, we replace the convolution in the first block of the ResNet50 by desubpixel convolution to enhance the feature extraction capabilities of the network. We also propose a spatial adaptive convolutional network to adjust the features extracted by the backbone so that the network can extract features more suitable for natural scene text detection. We implement the presented network based on PSENet. The results on ICDAR2015 and SCUT-CTW1500 demonstrate that our module can improve the performance of text detection. The precision, recall and F-measure have reached 87.27%, 84.88% and 86.06% on ICDAR2015. And they have reached 81.99%, 82.63% and 82.31% on CTW1500. Our code will be available at https://github.com/fengdashuai/Ada-PSENet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: Proceedings of the IEEE international conference on computer vision, pp 1206–1214
Google Scholar
Chen H, Qi X, Yu L et al (2017) Dcan: deep contour-aware networks for object instance segmentation from histology images - sciencedirect. Med Image Anal 36:135–146
Article Google Scholar
Coates A, Carpenter B, Case C et al (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition, pp 440–445
Google Scholar
Gao H, Cao L, Yu D, Xiong X, Cao M (2020) Semantic Segmentation of Marine Remote Sensing Based on a Cross Direction Attention Mechanism. In: IEEE Access, p 99
Google Scholar
Guan H, Yu Y et al (2016) Pole-Like Road Object Detection in Mobile LiDAR Data via Supervoxel and Bag-of-Contextual-Visual-Words Representation. IEEE Geoscience and Remote Sensing Letters 13(4):520–524
Article Google Scholar
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: European conference on computer vision, pp 770–778
Google Scholar
He P, Huang W, He T et al (2017) Single shot text detector with regional attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3047–3055
Google Scholar
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. Proc. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Google Scholar
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: European conference on computer vision, pp 497–511
Google Scholar
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528
Google Scholar
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116:1–20
Article MathSciNet Google Scholar
Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) ICDAR 2015 competition on robust reading. In: International conference on document analysis and recognition, pp 1156–1160
Google Scholar
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324
Article Google Scholar
Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence, pp 4161–4167
Google Scholar
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing 27:3676–3690
Article MathSciNet Google Scholar
Liao M, Wan Z, Yao C et al (2020) Real-time scene text detection with differentiable Binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474–11481
Google Scholar
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125
Google Scholar
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37
Google Scholar
Liu J, Liu X, Sheng J et al (2019) Pyramid mask text detector. In: arXiv preprint arXiv:1903.11800, 2019
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651
Google Scholar
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, vol 29, pp 4898–4906
Google Scholar
Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp 565–571
Chapter Google Scholar
Nayef N, Yin F, Bizid I et al (2017) ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: International conference on document analysis and recognition, pp 1454–1459
Google Scholar
Pavlidis T (1986) A vectorizer and feature extractor for document recognition. Computer Vision, Graphics, and Image Processing 35(1):111–127
Article Google Scholar
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061, 2(1):37–63
Shi W, Caballero J, Huszár F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1874–1883
Google Scholar
Shi BG, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2550–2558
Google Scholar
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. In: International Conference on Learning Representations 2015. (ICLR 2015)
Google Scholar
Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, pp 56–72
Google Scholar
Vu T, Van Nguyen C, Pham TX et al (2018) Fast and efficient image quality enhancement via desubpixel convolutional neural networks. In: European conference on computer vision, pp 243–259
Google Scholar
Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 9336–9345
Google Scholar
Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE international conference on computer vision, pp 8440–8449
Google Scholar
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: European conference on computer vision, pp 3–19
Google Scholar
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9038–9045
Google Scholar
Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2019) RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14(5):1280–1295
Article Google Scholar
Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. In: arXiv preprint arXiv:1712.02170, 2017
Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4159–4167
Google Scholar
Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5551–5560
Google Scholar
Zhu Y, Du J (2018) Sliding line point regression for shape robust scene text detection. In: 2018 24th international conference on pattern recognition (ICPR), Beijing, 2018, pp 3735–3740
Google Scholar

Download references

Acknowledgements

We thank the reviewers for giving us helpful suggestions.

Author information

Authors and Affiliations

College of Sciences, Northeastern University, Shenyang, People’s Republic of China
Qingsong Tang, Xiaoxu Feng & Xiangde Zhang

Authors

Qingsong Tang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoxu Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiangde Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingsong Tang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

The list of abbreviations

Abbreviation	Meaning
OCR	Optical character recognition
ResNet	Residual network
PSENet	Progressive scale expansion network
PAN	Pixel aggregation network
DBNet	Scene text detection with differentiable binarization
SSD	Single shot multibox detector
CTPN	Connectionist text proposal network
PMTD	Pyramid mask text detector
SPCNet	Supervised pyramid context network
Conv	Convlution
Ada-PSENet	A spatial feature adaptive network for text detection (based on PSENet)
FPN	Feature pyramid networks
CNNs	Convolutional neural networks
P	Precision
R	Recall
F	F-measure
SGD	Stochastic gradient descent
SegLink	Detecting oriented text in natural images by linking segments
SLPR	Sliding line point regression for shape robust scene text detection

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Q., Feng, X. & Zhang, X. A spatial feature adaptive network for text detection. Multimed Tools Appl 81, 15285–15302 (2022). https://doi.org/10.1007/s11042-022-12619-3

Download citation

Received: 23 October 2020
Revised: 12 February 2021
Accepted: 09 February 2022
Published: 28 February 2022
Issue Date: May 2022
DOI: https://doi.org/10.1007/s11042-022-12619-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A spatial feature adaptive network for text detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A spatial feature adaptive network for text detection

Abstract

Access this article

Similar content being viewed by others

U-Net: Convolutional Networks for Biomedical Image Segmentation

SSD: Single Shot MultiBox Detector

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation