Skip to main content
Log in

A spatial feature adaptive network for text detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Due to the capacity of detection arbitrary shapes of text and the robustness in practical applications, scene text detection methods based on segmentation have attached more attention. More accurate segmentation and better feature extraction are the core of segmentation-based detection. In order to refine the result of segmentation, we replace the convolution in the first block of the ResNet50 by desubpixel convolution to enhance the feature extraction capabilities of the network. We also propose a spatial adaptive convolutional network to adjust the features extracted by the backbone so that the network can extract features more suitable for natural scene text detection. We implement the presented network based on PSENet. The results on ICDAR2015 and SCUT-CTW1500 demonstrate that our module can improve the performance of text detection. The precision, recall and F-measure have reached 87.27%, 84.88% and 86.06% on ICDAR2015. And they have reached 81.99%, 82.63% and 82.31% on CTW1500. Our code will be available at https://github.com/fengdashuai/Ada-PSENet.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: Proceedings of the IEEE international conference on computer vision, pp 1206–1214

    Google Scholar 

  2. Chen H, Qi X, Yu L et al (2017) Dcan: deep contour-aware networks for object instance segmentation from histology images - sciencedirect. Med Image Anal 36:135–146

    Article  Google Scholar 

  3. Coates A, Carpenter B, Case C et al (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition, pp 440–445

    Google Scholar 

  4. Gao H, Cao L, Yu D, Xiong X, Cao M (2020) Semantic Segmentation of Marine Remote Sensing Based on a Cross Direction Attention Mechanism. In: IEEE Access, p 99

    Google Scholar 

  5. Guan H, Yu Y et al (2016) Pole-Like Road Object Detection in Mobile LiDAR Data via Supervoxel and Bag-of-Contextual-Visual-Words Representation. IEEE Geoscience and Remote Sensing Letters 13(4):520–524

    Article  Google Scholar 

  6. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: European conference on computer vision, pp 770–778

    Google Scholar 

  7. He P, Huang W, He T et al (2017) Single shot text detector with regional attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3047–3055

    Google Scholar 

  8. He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. Proc. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

    Google Scholar 

  9. Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: European conference on computer vision, pp 497–511

    Google Scholar 

  10. Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528

    Google Scholar 

  11. Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116:1–20

    Article  MathSciNet  Google Scholar 

  12. Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) ICDAR 2015 competition on robust reading. In: International conference on document analysis and recognition, pp 1156–1160

    Google Scholar 

  13. Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324

    Article  Google Scholar 

  14. Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence, pp 4161–4167

    Google Scholar 

  15. Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing 27:3676–3690

    Article  MathSciNet  Google Scholar 

  16. Liao M, Wan Z, Yao C et al (2020) Real-time scene text detection with differentiable Binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474–11481

    Google Scholar 

  17. Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125

    Google Scholar 

  18. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37

    Google Scholar 

  19. Liu J, Liu X, Sheng J et al (2019) Pyramid mask text detector. In: arXiv preprint arXiv:1903.11800, 2019

  20. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651

    Google Scholar 

  21. Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, vol 29, pp 4898–4906

    Google Scholar 

  22. Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp 565–571

    Chapter  Google Scholar 

  23. Nayef N, Yin F, Bizid I et al (2017) ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: International conference on document analysis and recognition, pp 1454–1459

    Google Scholar 

  24. Pavlidis T (1986) A vectorizer and feature extractor for document recognition. Computer Vision, Graphics, and Image Processing 35(1):111–127

    Article  Google Scholar 

  25. Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061, 2(1):37–63

  26. Shi W, Caballero J, Huszár F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1874–1883

    Google Scholar 

  27. Shi BG, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2550–2558

    Google Scholar 

  28. Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. In: International Conference on Learning Representations 2015. (ICLR 2015)

    Google Scholar 

  29. Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, pp 56–72

    Google Scholar 

  30. Vu T, Van Nguyen C, Pham TX et al (2018) Fast and efficient image quality enhancement via desubpixel convolutional neural networks. In: European conference on computer vision, pp 243–259

    Google Scholar 

  31. Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 9336–9345

    Google Scholar 

  32. Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE international conference on computer vision, pp 8440–8449

    Google Scholar 

  33. Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: European conference on computer vision, pp 3–19

    Google Scholar 

  34. Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9038–9045

    Google Scholar 

  35. Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2019) RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14(5):1280–1295

    Article  Google Scholar 

  36. Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. In: arXiv preprint arXiv:1712.02170, 2017

  37. Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4159–4167

    Google Scholar 

  38. Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5551–5560

    Google Scholar 

  39. Zhu Y, Du J (2018) Sliding line point regression for shape robust scene text detection. In: 2018 24th international conference on pattern recognition (ICPR), Beijing, 2018, pp 3735–3740

    Google Scholar 

Download references

Acknowledgements

We thank the reviewers for giving us helpful suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qingsong Tang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

The list of abbreviations

Abbreviation

Meaning

OCR

Optical character recognition

ResNet

Residual network

PSENet

Progressive scale expansion network

PAN

Pixel aggregation network

DBNet

Scene text detection with differentiable binarization

SSD

Single shot multibox detector

CTPN

Connectionist text proposal network

PMTD

Pyramid mask text detector

SPCNet

Supervised pyramid context network

Conv

Convlution

Ada-PSENet

A spatial feature adaptive network for text detection (based on PSENet)

FPN

Feature pyramid networks

CNNs

Convolutional neural networks

P

Precision

R

Recall

F

F-measure

SGD

Stochastic gradient descent

SegLink

Detecting oriented text in natural images by linking segments

SLPR

Sliding line point regression for shape robust scene text detection

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Q., Feng, X. & Zhang, X. A spatial feature adaptive network for text detection. Multimed Tools Appl 81, 15285–15302 (2022). https://doi.org/10.1007/s11042-022-12619-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12619-3

Keywords

Navigation