research-article

ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection

Authors:
Kangming Weng

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

0000-0002-0537-017X
View Profile

,
Xia Du

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

0000-0002-6298-846X
View Profile

,
Kunze Chen

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

0000-0003-2201-3999
View Profile

,
Dahan Wang

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

0000-0002-5901-0778
View Profile

,
Shunzhi Zhu

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen University of Technology, China

0000-0001-9715-4281
View Profile

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern RecognitionNovember 2022Pages 328–333https://doi.org/10.1145/3581807.3581854

Published:22 May 2023Publication History

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pages 328–333

ABSTRACT

The segmentation-based approach is an essential direction of scene text detection, and it can detect arbitrary or curved text, which has attracted the increasing attention of many researchers. However, extensive research has shown that the segmentation-based method will be disturbed by adjoining pixels and cannot effectively identify the text boundaries. To tackle this problem, we proposed a ResAsapp Conv based on the PSE algorithm. This convolution structure can provide different scale visual fields about the object and make it effectively recognize the boundary of texts. The method's effectiveness is validated on three benchmark datasets, CTW1500, Total-Text, and ICDAR2015 datasets. In particular, on the CTW1500 dataset, a dataset full of long curve text in all kinds of scenes, which is hard to distinguish, our network achieves an F-measure of 81.2%.

References

Liao, Minghui, "Textboxes: A fast text detector with a single deep neural network." Thirty-first AAAI conference on artificial intelligence. 2017.Google Scholar
Ma, Jianqi, "Arbitrary-oriented scene text detection via rotation proposals." IEEE Transactions on Multimedia 20.11 (2018): 3111-3122.Google ScholarDigital Library
Tian, Zhi, "Detecting text in natural image with connectionist text proposal network." European conference on computer vision. Springer, Cham, 2016.Google Scholar
Zhang, Zheng, "Multi-oriented text detection with fully convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
Sun, Lei, "Robust text detection in natural scene images by generalized color-enhanced contrasting extremal region and neural networks." 2014 22nd International Conference on Pattern Recognition. IEEE, 2014.Google Scholar
Ren, Shaoqing, "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in neural information processing systems 28 (2015).Google Scholar
Wang, Yuxin, "Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection." proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.Google Scholar
Wang, Hao, "All you need is boundary: Toward arbitrary-shaped text spotting." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.Google Scholar
Wang, Xiaobing, "Arbitrary shape scene text detection with adaptive text region representation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.Google Scholar
Qin, Siyang, "Towards unconstrained end-to-end text spotting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
Shi, Baoguang, Xiang Bai, and Serge Belongie. "Detecting oriented text in natural images by linking segments." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
Lyu, Pengyuan, "Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes." Proceedings of the European Conference on Computer Vision (ECCV). 2018.Google Scholar
Liao, Minghui, "Mask textspotter v3: Segmentation proposal network for robust scene text spotting." European Conference on Computer Vision. Springer, Cham, 2020.Google Scholar
Zhou, Xinyu, "East: an efficient and accurate scene text detector." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017.Google Scholar
Wang, Wenhai, "Shape robust text detection with progressive scale expansion network." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.Google Scholar
Liao, Minghui, "Real-time scene text detection with differentiable binarization." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 34. No. 07. 2020.Google Scholar
Tian, Zhi, "Fcos: Fully convolutional one-stage object detection." Proceedings of the IEEE/CVF international conference on computer vision. 2019.Google Scholar
Zhu, Yiqin, "Fourier contour embedding for arbitrary-shaped text detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021.Google Scholar
Lin, Tsung-Yi, "Feature pyramid networks for object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.Google Scholar
Ch'ng, Chee Kheng, and Chee Seng Chan. "Total-text: A comprehensive dataset for scene text detection and recognition." 2017 14th IAPR international conference on document analysis and recognition (ICDAR). Vol. 1. IEEE, 2017.Google Scholar
Karatzas, Dimosthenis, "ICDAR 2015 competition on robust reading." 2015 13th international conference on document analysis and recognition (ICDAR). IEEE, 2015.Google Scholar
Yuliang, Liu, "Detecting curve text in the wild: New dataset and new solution." arXiv preprint arXiv:1712.02170 (2017).Google Scholar
De Boer, Pieter-Tjerk, "A tutorial on the cross-entropy method." Annals of operations research 134.1 (2005): 19-67.Google ScholarCross Ref
Milletari, Fausto, Nassir Navab, and Seyed-Ahmad Ahmadi. "V-net: Fully convolutional neural networks for volumetric medical image segmentation." 2016 fourth international conference on 3D vision (3DV). IEEE, 2016.Google Scholar
He, Kaiming, "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
Liu, Yuliang, "ABCNet v2: Adaptive bezier-curve network for real-time end-to-end text spotting." arXiv preprint arXiv:2105.03620 (2021).Google Scholar
Feng, Wei, "Textdragon: An end-to-end framework for arbitrary shaped text spotting." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.Google Scholar
Liu, Rosanne, "An intriguing failing of convolutional neural networks and the coordconv solution." Advances in neural information processing systems 31 (2018).Google Scholar
Chen, Liang-Chieh, "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834-848.Google Scholar
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.Google Scholar
Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.Google Scholar
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012).Google Scholar
Long, Shangbang, "Textsnake: A flexible representation for detecting text of arbitrary shapes." Proceedings of the European conference on computer vision (ECCV). 2018.Google Scholar
He, Kaiming, "Mask r-cnn." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar
Wang, Wenhai, "Efficient and accurate arbitrary-shaped text detection with pixel aggregation network." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.Google Scholar
He, Pan, "Single shot text detector with regional attention." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar
Hu, Han, "Wordsup: Exploiting word annotations for character based text detection." Proceedings of the IEEE international conference on computer vision. 2017.Google Scholar

Index Terms

ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding

Recommendations

Total-Text: toward orientation robustness in scene text detection
Abstract
At present, text orientation is not diverse enough in the existing scene text datasets. Specifically, curve-orientated text is largely out-numbered by horizontal and multi-oriented text, hence, it has received minimal attention from the community ...
Read More
Scene Text Deblurring in Non-stationary Video Sequences

Text detection in natural scenes burdened by imperfect shooting conditions and blurring artifacts is the subject of the present paper. The text as a linguistic component provides a significant amount of information for scene understanding, scene ...
Read More
Curved Scene Text Detection Based on Mask R-CNN
Image and Graphics
Abstract
Text detection in natural scenes has achieved good results in existing research methods. However, detecting the curved scene text is still a challenging task because of perspective distortion and variation of text scale. We proposed Mask-CSTD (...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition
November 2022
683 pages
ISBN:9781450397056
DOI:10.1145/3581807

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 May 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Curved Text
Pixel Interference
Scene Text Detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 10
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

ABSTRACT

References

Cited By

Index Terms

Recommendations

Total-Text: toward orientation robustness in scene text detection

Scene Text Deblurring in Non-stationary Video Sequences

Curved Scene Text Detection Based on Mask R-CNN

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

ResAsapp: An Effective Convolution to Distinguish Adjacent Pixels For Scene Text Detection

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

ABSTRACT

References

Cited By

Index Terms

Recommendations

Total-Text: toward orientation robustness in scene text detection

Scene Text Deblurring in Non-stationary Video Sequences

Curved Scene Text Detection Based on Mask R-CNN

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media