ABSTRACT
The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.
- Liao, Minghui, "Textboxes: A fast text detector with a single deep neural network." Thirty-first AAAI conference on artificial intelligence. 2017.Google Scholar
- Ma, Jianqi, "Arbitrary-oriented scene text detection via rotation proposals." IEEE Transactions on Multimedia 20.11 (2018): 3111-3122.Google ScholarDigital Library
- Tian, Zhi, "Detecting text in natural image with connectionist text proposal network." European conference on computer vision. Springer, Cham, 2016.Google Scholar
- Zhang, Zheng, "Multi-oriented text detection with fully convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
- Sun, Lei, "Robust text detection in natural scene images by generalized color-enhanced contrasting extremal region and neural networks." 2014 22nd International Conference on Pattern Recognition. IEEE, 2014.Google Scholar
- Ren, Shaoqing, "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).Google Scholar
- Wang, Yuxin, "Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection." proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.Google Scholar
- Wang, Hao, "All you need is boundary: Toward arbitrary-shaped text spotting." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.Google Scholar
- Wang, Xiaobing, "Arbitrary shape scene text detection with adaptive text region representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.Google Scholar
- Qin, Siyang, "Towards unconstrained end-to-end text spotting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
- Shi, Baoguang, Xiang Bai, and Serge Belongie. "Detecting oriented text in natural images by linking segments." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
- Lyu, Pengyuan, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Google Scholar
- Liao, Minghui, "Mask textspotter v3: Segmentation proposal network for robust scene text spotting." European Conference on Computer Vision. Springer, Cham, 2020.Google Scholar
- Zhou, Xinyu, "East: an efficient and accurate scene text detector." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017.Google Scholar
- Wang, Wenhai, "Shape robust text detection with progressive scale expansion network." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.Google Scholar
- Liao, Minghui, "Real-time scene text detection with differentiable binarization." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.Google Scholar
- Tian, Zhi, "Fcos: Fully convolutional one-stage object detection." Proceedings of the IEEE/CVF international conference on computer vision. 2019.Google Scholar
- Zhu, Yiqin, "Fourier contour embedding for arbitrary-shaped text detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.Google Scholar
- Lin, Tsung-Yi, "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
- Ch'ng, Chee Kheng, and Chee Seng Chan. "Total-text: A comprehensive dataset for scene text detection and recognition." 2017 14th IAPR international conference on document analysis and recognition (ICDAR). Vol. 1. IEEE, 2017.Google Scholar
- Karatzas, Dimosthenis, "ICDAR 2015 competition on robust reading." 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, 2015.Google Scholar
- Yuliang, Liu, "Detecting curve text in the wild: New dataset and new solution." arXiv preprint arXiv:1712.02170 (2017).Google Scholar
- De Boer, Pieter-Tjerk, "A tutorial on the cross-entropy method." Annals of operations research 134.1 (2005): 19-67.Google ScholarCross Ref
- Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. "V-net: Fully convolutional neural networks for volumetric medical image segmentation." 2016 fourth international conference on 3D vision (3DV). IEEE, 2016.Google Scholar
- He, Kaiming, "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
- Liu, Yuliang, "ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting." arXiv preprint arXiv:2105.03620 (2021).Google Scholar
- Feng, Wei, "Textdragon: An end-to-end framework for arbitrary shaped text spotting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
- Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.Google Scholar
- Liu, Rosanne, "An intriguing failing of convolutional neural networks and the coordconv solution." Advances in neural information processing systems 31 (2018).Google Scholar
- Chen, Liang-Chieh, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834-848.Google Scholar
- Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.Google Scholar
- Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
- Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).Google Scholar
- Long, Shangbang, "Textsnake: A flexible representation for detecting text of arbitrary shapes." Proceedings of the European conference on computer vision (ECCV). 2018.Google Scholar
- He, Kaiming, "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar
- Wang, Wenhai, "Efficient and accurate arbitrary-shaped text detection with pixel aggregation network." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
- He, Pan, "Single shot text detector with regional attention." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar
- Hu, Han, "Wordsup: Exploiting word annotations for character based text detection." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar
Index Terms
- ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection
Recommendations
Total-Text: toward orientation robustness in scene text detection
AbstractAt present, text orientation is not diverse enough in the existing scene text datasets. Specifically, curve-orientated text is largely out-numbered by horizontal and multi-oriented text, hence, it has received minimal attention from the community ...
Scene Text Deblurring in Non-stationary Video Sequences
Text detection in natural scenes burdened by imperfect shooting conditions and blurring artifacts is the subject of the present paper. The text as a linguistic component provides a significant amount of information for scene understanding, scene ...
Curved Scene Text Detection Based on Mask R-CNN
Image and GraphicsAbstractText detection in natural scenes has achieved good results in existing research methods. However, detecting the curved scene text is still a challenging task because of perspective distortion and variation of text scale. We proposed Mask-CSTD (...
Comments