skip to main content
10.1145/3607541.3616813acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Exploring Anchor-Free Approach for Reading Chinese Characters

Published: 29 October 2023 Publication History

Abstract

Scene text spotting has achieved an impressive performance over recent years. Currently, most text localization methods are designed with the text line instance. We argue that building a character-level spotting network is more suited to recognize the Chinese of text and Chinese is also common in scene text images. In this paper, we explore an anchor-free spotting framework that treats a character as a single point. To better capture Chinese character features, we first use the Canny edge detectors and superimpose the obtained edge information onto the RGB image channel. After that, a feed-forward network is set up and the inference can be processed in a single network forward-pass, without complex post-processing steps. Experiments are performed on the Chinese text dataset and the quantitative comparisons demonstrate the effectiveness of the anchor-free approach.

References

[1]
David Acuna, Amlan Kar, and Sanja Fidler. 2019. Devil is in the edges: Learning semantic boundaries from noisy annotations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11075--11083.
[2]
Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character region awareness for text detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9365--9374.
[3]
Youngmin Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee, Daehyun Nam, and Hwalsuk Lee. 2020. Character region attention for text spotting. In European Conference on Computer Vision. 504--521.
[4]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2017), 834--848.
[5]
Ying Chen, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Xi Li. 2022. Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting. In European Conference on Computer Vision. 356--373.
[6]
Lijun Ding and Ardeshir Goshtasby. 2001. On the Canny edge detector. Pattern Recognition, Vol. 34, 3 (2001), 721--725.
[7]
Zhichao Fu, Yingbin Zheng, Tianlong Ma, Hao Ye, Jing Yang, and Liang He. 2022. Edge-aware deep image deblurring. Neurocomputing, Vol. 502 (2022), 37--47.
[8]
Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. 2021. Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2918--2928.
[9]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In IEEE/CVF International Conference on Computer Vision. 2961--2969.
[10]
Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, and Lianwen Jin. 2022. SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4593--4603.
[11]
Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, and Xiang Bai. 2020. Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In European Conference on Computer Vision. 706--722.
[12]
Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In AAAI Conference on Artificial Intelligence.
[13]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European Conference on Computer Vision. 21--37.
[14]
Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. Fots: Fast oriented text spotting with a unified network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5676--5685.
[15]
Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, and Hao Chen. 2022. Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 11 (2022), 8048--8064.
[16]
Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3431--3440.
[17]
Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In European Conference on Computer Vision. 67--83.
[18]
Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, Vol. 20, 11 (2018), 3111--3122.
[19]
G Mandal and D Bhattacharjee. 2020. Learning-based single image super-resolution with improved edge information. Pattern Recognition and Image Analysis, Vol. 30, 3 (2020), 391--400.
[20]
Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Jing Li, Shenggao Zhu, Dahua Lin, Chunhua Shen, et al. 2022. SPTS: single-point text spotting. 4272--4281.
[21]
Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. 2021. Mango: A mask attention guided one-stage scene text spotter. In AAAI Conference on Artificial Intelligence, Vol. 35. 2467--2476.
[22]
Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, and Ying Xiao. 2019. Towards unconstrained end-to-end text spotting. In IEEE/CVF International Conference on Computer Vision. 4704--4714.
[23]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7263--7271.
[24]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Neural Information Processing Systems, Vol. 28 (2015), 91--99.
[25]
Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 11 (2016), 2298--2304.
[26]
Towaki Takikawa, David Acuna, Varun Jampani, and Sanja Fidler. 2019. Gated-scnn: Gated shape cnns for semantic segmentation. In IEEE/CVF International Conference on Computer Vision. 5229--5238.
[27]
Hao Wang, Pu Lu, Hui Zhang, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, and Wenyu Liu. 2020. All you need is boundary: Toward arbitrary-shaped text spotting. In AAAI Conference on Artificial Intelligence, Vol. 34. 12160--12167.
[28]
Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao He, Yongpan Wang, and Canjie Luo. 2021b. Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5973--5982.
[29]
Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, and Chunhua Shen. 2021a. Pan: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 9 (2021), 5349--5367.
[30]
Xingjiao Wu, Yingbin Zheng, Tianlong Ma, Hao Ye, and Liang He. 2021. Document image layout analysis via explicit edge embedding network. Information Sciences, Vol. 577 (2021), 436--448.
[31]
Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2403--2412.
[32]
Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu, and Shi-Min Hu. 2019. A large chinese text dataset in the wild. Journal of Computer Science and Technology, Vol. 34, 3 (2019), 509--521.
[33]
Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, and Tong Lu. 2021. Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv:2110.10405 (2021).
[34]
Xingyi Zhou, Dequan Wang, and Philipp Kr"ahenbühl. 2019. Objects as points. arXiv:1904.07850 (2019).
[35]
Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. East: an efficient and accurate scene text detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5551--5560. io

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice
October 2023
151 pages
ISBN:9798400702785
DOI:10.1145/3607541
  • General Chairs:
  • Cheng Jin,
  • Liang He,
  • Mingli Song,
  • Rui Wang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. anchor-free approach
  2. chinese characters
  3. scene text spotting

Qualifiers

  • Research-article

Conference

MM '23
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 38
    Total Downloads
  • Downloads (Last 12 months)13
  • Downloads (Last 6 weeks)1
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media