Abstract
Due to the capacity of detection arbitrary shapes of text and the robustness in practical applications, scene text detection methods based on segmentation have attached more attention. More accurate segmentation and better feature extraction are the core of segmentation-based detection. In order to refine the result of segmentation, we replace the convolution in the first block of the ResNet50 by desubpixel convolution to enhance the feature extraction capabilities of the network. We also propose a spatial adaptive convolutional network to adjust the features extracted by the backbone so that the network can extract features more suitable for natural scene text detection. We implement the presented network based on PSENet. The results on ICDAR2015 and SCUT-CTW1500 demonstrate that our module can improve the performance of text detection. The precision, recall and F-measure have reached 87.27%, 84.88% and 86.06% on ICDAR2015. And they have reached 81.99%, 82.63% and 82.31% on CTW1500. Our code will be available at https://github.com/fengdashuai/Ada-PSENet.
Similar content being viewed by others
References
Busta M, Neumann L, Matas J (2015) Fastext: efficient unconstrained scene text detector. In: Proceedings of the IEEE international conference on computer vision, pp 1206–1214
Chen H, Qi X, Yu L et al (2017) Dcan: deep contour-aware networks for object instance segmentation from histology images - sciencedirect. Med Image Anal 36:135–146
Coates A, Carpenter B, Case C et al (2011) Text detection and character recognition in scene images with unsupervised feature learning. In: International conference on document analysis and recognition, pp 440–445
Gao H, Cao L, Yu D, Xiong X, Cao M (2020) Semantic Segmentation of Marine Remote Sensing Based on a Cross Direction Attention Mechanism. In: IEEE Access, p 99
Guan H, Yu Y et al (2016) Pole-Like Road Object Detection in Mobile LiDAR Data via Supervoxel and Bag-of-Contextual-Visual-Words Representation. IEEE Geoscience and Remote Sensing Letters 13(4):520–524
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: European conference on computer vision, pp 770–778
He P, Huang W, He T et al (2017) Single shot text detector with regional attention. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 3047–3055
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. Proc. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Huang W, Qiao Y, Tang X (2014) Robust scene text detection with convolution neural network induced MSER trees. In: European conference on computer vision, pp 497–511
Jaderberg M, Vedaldi A, Zisserman A (2014) Deep features for text spotting. In: European conference on computer vision, pp 512–528
Jaderberg M, Simonyan K, Vedaldi A, Zisserman A (2016) Reading text in the wild with convolutional neural networks. International Journal of Computer Vision 116:1–20
Karatzas D, Gomez-Bigorda L, Nicolaou A et al (2015) ICDAR 2015 competition on robust reading. In: International conference on document analysis and recognition, pp 1156–1160
Lecun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86:2278–2324
Liao M, Shi B, Bai X et al (2017) Textboxes: a fast text detector with a single deep neural network. In: Proceedings of the AAAI conference on artificial intelligence, pp 4161–4167
Liao M, Shi B, Bai X (2018) Textboxes++: a single-shot oriented scene text detector. IEEE Transactions on Image Processing 27:3676–3690
Liao M, Wan Z, Yao C et al (2020) Real-time scene text detection with differentiable Binarization. In: Proceedings of the AAAI conference on artificial intelligence, pp 11474–11481
Lin TY, Dollár P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2117–2125
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: European conference on computer vision, pp 21–37
Liu J, Liu X, Sheng J et al (2019) Pyramid mask text detector. In: arXiv preprint arXiv:1903.11800, 2019
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(4):640–651
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. In: Proceedings of the 30th International Conference on Neural Information Processing Systems, vol 29, pp 4898–4906
Milletari F, Navab N, Ahmadi SA (2016) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: 2016 fourth international conference on 3D vision (3DV), pp 565–571
Nayef N, Yin F, Bizid I et al (2017) ICDAR2017 robust reading challenge on multi-lingual scene text detection and script identification-rrc-mlt. In: International conference on document analysis and recognition, pp 1454–1459
Pavlidis T (1986) A vectorizer and feature extractor for document recognition. Computer Vision, Graphics, and Image Processing 35(1):111–127
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061, 2(1):37–63
Shi W, Caballero J, Huszár F et al (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 1874–1883
Shi BG, Bai X, Belongie S (2017) Detecting oriented text in natural images by linking segments. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 2550–2558
Simonyan K, Zisserman A (2015) Very Deep Convolutional Networks for Large-Scale Image Recognition. In: International Conference on Learning Representations 2015. (ICLR 2015)
Tian Z, Huang W, He T et al (2016) Detecting text in natural image with connectionist text proposal network. In: European conference on computer vision, pp 56–72
Vu T, Van Nguyen C, Pham TX et al (2018) Fast and efficient image quality enhancement via desubpixel convolutional neural networks. In: European conference on computer vision, pp 243–259
Wang W, Xie E, Li X et al (2019) Shape robust text detection with progressive scale expansion network. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 9336–9345
Wang W, Xie E, Song X et al (2019) Efficient and accurate arbitrary-shaped text detection with pixel aggregation network. In: Proceedings of the IEEE international conference on computer vision, pp 8440–8449
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: European conference on computer vision, pp 3–19
Xie E, Zang Y, Shao S, Yu G, Yao C, Li G (2019) Scene text detection with supervised pyramid context network. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 9038–9045
Yang ZL, Guo XQ, Chen ZM, Huang YF, Zhang YJ (2019) RNN-Stega: linguistic steganography based on recurrent neural networks. IEEE Transactions on Information Forensics and Security 14(5):1280–1295
Yuliang L, Lianwen J, Shuaitao Z et al (2017) Detecting curve text in the wild: new dataset and new solution. In: arXiv preprint arXiv:1712.02170, 2017
Zhang Z, Zhang C, Shen W et al (2016) Multi-oriented text detection with fully convolutional networks. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 4159–4167
Zhou X, Yao C, Wen H et al (2017) East: an efficient and accurate scene text detector. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 5551–5560
Zhu Y, Du J (2018) Sliding line point regression for shape robust scene text detection. In: 2018 24th international conference on pattern recognition (ICPR), Beijing, 2018, pp 3735–3740
Acknowledgements
We thank the reviewers for giving us helpful suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
The list of abbreviations
Abbreviation | Meaning |
---|---|
OCR | Optical character recognition |
ResNet | Residual network |
PSENet | Progressive scale expansion network |
PAN | Pixel aggregation network |
DBNet | Scene text detection with differentiable binarization |
SSD | Single shot multibox detector |
CTPN | Connectionist text proposal network |
PMTD | Pyramid mask text detector |
SPCNet | Supervised pyramid context network |
Conv | Convlution |
Ada-PSENet | A spatial feature adaptive network for text detection (based on PSENet) |
FPN | Feature pyramid networks |
CNNs | Convolutional neural networks |
P | Precision |
R | Recall |
F | F-measure |
SGD | Stochastic gradient descent |
SegLink | Detecting oriented text in natural images by linking segments |
SLPR | Sliding line point regression for shape robust scene text detection |
Rights and permissions
About this article
Cite this article
Tang, Q., Feng, X. & Zhang, X. A spatial feature adaptive network for text detection. Multimed Tools Appl 81, 15285–15302 (2022). https://doi.org/10.1007/s11042-022-12619-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12619-3