research-article

Exploring Anchor-Free Approach for Reading Chinese Characters

Authors:

Cheng JinAuthors Info & Claims

McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

Pages 23 - 28

https://doi.org/10.1145/3607541.3616813

Published: 29 October 2023 Publication History

Abstract

Scene text spotting has achieved an impressive performance over recent years. Currently, most text localization methods are designed with the text line instance. We argue that building a character-level spotting network is more suited to recognize the Chinese of text and Chinese is also common in scene text images. In this paper, we explore an anchor-free spotting framework that treats a character as a single point. To better capture Chinese character features, we first use the Canny edge detectors and superimpose the obtained edge information onto the RGB image channel. After that, a feed-forward network is set up and the inference can be processed in a single network forward-pass, without complex post-processing steps. Experiments are performed on the Chinese text dataset and the quantitative comparisons demonstrate the effectiveness of the anchor-free approach.

References

[1]

David Acuna, Amlan Kar, and Sanja Fidler. 2019. Devil is in the edges: Learning semantic boundaries from noisy annotations. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 11075--11083.

[2]

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, and Hwalsuk Lee. 2019. Character region awareness for text detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 9365--9374.

[3]

Youngmin Baek, Seung Shin, Jeonghun Baek, Sungrae Park, Junyeop Lee, Daehyun Nam, and Hwalsuk Lee. 2020. Character region attention for text spotting. In European Conference on Computer Vision. 504--521.

Digital Library

[4]

Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille. 2017. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 40, 4 (2017), 834--848.

[5]

Ying Chen, Liang Qiao, Zhanzhan Cheng, Shiliang Pu, Yi Niu, and Xi Li. 2022. Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting. In European Conference on Computer Vision. 356--373.

[6]

Lijun Ding and Ardeshir Goshtasby. 2001. On the Canny edge detector. Pattern Recognition, Vol. 34, 3 (2001), 721--725.

[7]

Zhichao Fu, Yingbin Zheng, Tianlong Ma, Hao Ye, Jing Yang, and Liang He. 2022. Edge-aware deep image deblurring. Neurocomputing, Vol. 502 (2022), 37--47.

Digital Library

[8]

Golnaz Ghiasi, Yin Cui, Aravind Srinivas, Rui Qian, Tsung-Yi Lin, Ekin D Cubuk, Quoc V Le, and Barret Zoph. 2021. Simple copy-paste is a strong data augmentation method for instance segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2918--2928.

[9]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In IEEE/CVF International Conference on Computer Vision. 2961--2969.

[10]

Mingxin Huang, Yuliang Liu, Zhenghao Peng, Chongyu Liu, Dahua Lin, Shenggao Zhu, Nicholas Yuan, Kai Ding, and Lianwen Jin. 2022. SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4593--4603.

[11]

Minghui Liao, Guan Pang, Jing Huang, Tal Hassner, and Xiang Bai. 2020. Mask textspotter v3: Segmentation proposal network for robust scene text spotting. In European Conference on Computer Vision. 706--722.

Digital Library

[12]

Minghui Liao, Baoguang Shi, Xiang Bai, Xinggang Wang, and Wenyu Liu. 2017. Textboxes: A fast text detector with a single deep neural network. In AAAI Conference on Artificial Intelligence.

[13]

Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European Conference on Computer Vision. 21--37.

[14]

Xuebo Liu, Ding Liang, Shi Yan, Dagui Chen, Yu Qiao, and Junjie Yan. 2018. Fots: Fast oriented text spotting with a unified network. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5676--5685.

[15]

Yuliang Liu, Chunhua Shen, Lianwen Jin, Tong He, Peng Chen, Chongyu Liu, and Hao Chen. 2022. Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 11 (2022), 8048--8064.

[16]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 3431--3440.

[17]

Pengyuan Lyu, Minghui Liao, Cong Yao, Wenhao Wu, and Xiang Bai. 2018. Mask textspotter: An end-to-end trainable neural network for spotting text with arbitrary shapes. In European Conference on Computer Vision. 67--83.

Digital Library

[18]

Jianqi Ma, Weiyuan Shao, Hao Ye, Li Wang, Hong Wang, Yingbin Zheng, and Xiangyang Xue. 2018. Arbitrary-oriented scene text detection via rotation proposals. IEEE Transactions on Multimedia, Vol. 20, 11 (2018), 3111--3122.

Digital Library

[19]

G Mandal and D Bhattacharjee. 2020. Learning-based single image super-resolution with improved edge information. Pattern Recognition and Image Analysis, Vol. 30, 3 (2020), 391--400.

Digital Library

[20]

Dezhi Peng, Xinyu Wang, Yuliang Liu, Jiaxin Zhang, Mingxin Huang, Songxuan Lai, Jing Li, Shenggao Zhu, Dahua Lin, Chunhua Shen, et al. 2022. SPTS: single-point text spotting. 4272--4281.

[21]

Liang Qiao, Ying Chen, Zhanzhan Cheng, Yunlu Xu, Yi Niu, Shiliang Pu, and Fei Wu. 2021. Mango: A mask attention guided one-stage scene text spotter. In AAAI Conference on Artificial Intelligence, Vol. 35. 2467--2476.

[22]

Siyang Qin, Alessandro Bissacco, Michalis Raptis, Yasuhisa Fujii, and Ying Xiao. 2019. Towards unconstrained end-to-end text spotting. In IEEE/CVF International Conference on Computer Vision. 4704--4714.

[23]

Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7263--7271.

[24]

Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. Neural Information Processing Systems, Vol. 28 (2015), 91--99.

[25]

Baoguang Shi, Xiang Bai, and Cong Yao. 2016. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 39, 11 (2016), 2298--2304.

Digital Library

[26]

Towaki Takikawa, David Acuna, Varun Jampani, and Sanja Fidler. 2019. Gated-scnn: Gated shape cnns for semantic segmentation. In IEEE/CVF International Conference on Computer Vision. 5229--5238.

[27]

Hao Wang, Pu Lu, Hui Zhang, Mingkun Yang, Xiang Bai, Yongchao Xu, Mengchao He, Yongpan Wang, and Wenyu Liu. 2020. All you need is boundary: Toward arbitrary-shaped text spotting. In AAAI Conference on Artificial Intelligence, Vol. 34. 12160--12167.

[28]

Tianwei Wang, Yuanzhi Zhu, Lianwen Jin, Dezhi Peng, Zhe Li, Mengchao He, Yongpan Wang, and Canjie Luo. 2021b. Implicit Feature Alignment: Learn to Convert Text Recognizer to Text Spotter. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5973--5982.

[29]

Wenhai Wang, Enze Xie, Xiang Li, Xuebo Liu, Ding Liang, Zhibo Yang, Tong Lu, and Chunhua Shen. 2021a. Pan: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 44, 9 (2021), 5349--5367.

[30]

Xingjiao Wu, Yingbin Zheng, Tianlong Ma, Hao Ye, and Liang He. 2021. Document image layout analysis via explicit edge embedding network. Information Sciences, Vol. 577 (2021), 436--448.

Digital Library

[31]

Fisher Yu, Dequan Wang, Evan Shelhamer, and Trevor Darrell. 2018. Deep layer aggregation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2403--2412.

[32]

Tai-Ling Yuan, Zhe Zhu, Kun Xu, Cheng-Jun Li, Tai-Jiang Mu, and Shi-Min Hu. 2019. A large chinese text dataset in the wild. Journal of Computer Science and Technology, Vol. 34, 3 (2019), 509--521.

[33]

Humen Zhong, Jun Tang, Wenhai Wang, Zhibo Yang, Cong Yao, and Tong Lu. 2021. Arts: Eliminating inconsistency between text detection and recognition with auto-rectification text spotter. arXiv:2110.10405 (2021).

[34]

Xingyi Zhou, Dequan Wang, and Philipp Kr"ahenbühl. 2019. Objects as points. arXiv:1904.07850 (2019).

[35]

Xinyu Zhou, Cong Yao, He Wen, Yuzhi Wang, Shuchang Zhou, Weiran He, and Jiajun Liang. 2017. East: an efficient and accurate scene text detector. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5551--5560. io

Index Terms

Exploring Anchor-Free Approach for Reading Chinese Characters
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object detection
      2. Computer vision representations
        Appearance and texture representations

Recommendations

Stroke effect on legibility of Japanese characters

This study applied a computer program to analyze the descriptors of Japanese characters, including 56 Hiragana, 56 Katakana, and 98 Kanji characters. An experiment was designed to test the legibility of these characters by 40 Japanese students studying ...
Machine Recognition of Hand-Printed Chinese Characters

The recognition of Chinese characters has been an area of great interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: ...
Recognition of hand-printed Chinese characters using decision trees/machine learning C4.5 system

Recognition of Chinese characters has been an area of major interest for many years, and a large number of research papers and reports have already been published in this area. There are several major problems with Chinese character recognition: Chinese ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

McGE '23: Proceedings of the 1st International Workshop on Multimedia Content Generation and Evaluation: New Methods and Practice

October 2023

151 pages

ISBN:9798400702785

DOI:10.1145/3607541

General Chairs:
Cheng Jin
Professor, Fudan University, China
,
Liang He
Professor, East China Normal University, China
,
Mingli Song
Professor, Zhejiang University, China
,
Rui Wang
Professor, IIE, Chinese Academy of Sciences, China

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 29 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29, 2023

Ottawa ON, Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
38
Total Downloads

Downloads (Last 12 months)13
Downloads (Last 6 weeks)1

Reflects downloads up to 02 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten