research-article

Improving text recognition by combining visual and linguistic features of text

Authors:
Cong Tran

Posts and Telecommunications Institute of Technology, Viet Nam

Posts and Telecommunications Institute of Technology, Viet Nam

0000-0001-9467-4978
View Profile

,
Khanh Nguyen-Trong

Posts and Telecommunications Institute of Technology, Viet Nam

Posts and Telecommunications Institute of Technology, Viet Nam

0000-0001-5175-8805
View Profile

,
Cuong Pham

Posts and Telecommunications Institute of Technology, Viet Nam

Posts and Telecommunications Institute of Technology, Viet Nam

0000-0003-0973-0889
View Profile

,
Dat Tran-Anh

Posts and Telecommunications Institute of Technology, Viet Nam

Posts and Telecommunications Institute of Technology, Viet Nam

0000-0002-8924-4356
View Profile

,
Tien Nguyen-Thi-Tan

Thai Nguyen University of Medicine and Pharmacy, Viet Nam

Thai Nguyen University of Medicine and Pharmacy, Viet Nam

0000-0002-6300-336X
View Profile

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication TechnologyDecember 2022Pages 329–335https://doi.org/10.1145/3568562.3568624

Published:01 December 2022Publication History

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

Pages 329–335

ABSTRACT

While being studied for several decades, Optical Character Recognition (OCR) has still been attracting considerable attention from researchers. Previous studies tend to focus on visual features of optical texts, such as texture, shape, and color to build OCR models. However, linguistic features, an important factor for OCR, has not been extensively investigated, especially for Vietnamese-OCR scanned documents. Therefore, we introduce a method to improve the performance of Vietnamese OCR by combining both visual and linguistic features of the optical text. The proposed method consists of (i) a domain-specific dictionary and (ii) a modified natural language processing model termed ABCNet, employed at the training and inference step, to determine the best candidate for the visual appearance of the text. Moreover, our method can easily be integrated with existing OCR methods to further increase their performance. Experimental results on a newly collected dataset show that the proposed method achieves an accuracy of 83.61% and a F1 score of 84.1%.

References

Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, and R. Manmatha. 2021. DocFormer: End-to-End Transformer for Document Understanding. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 973–983. https://doi.org/10.1109/ICCV48922.2021.00103Google Scholar
Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings(2015), 1–15. arxiv:1409.0473Google Scholar
Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2020. Text Recognition in the Wild: A Survey. arxiv:2005.03492 [cs.CV]Google Scholar
Mengmeng Cui, Wei Wang, Jinjin Zhang, and Liang Wang. 2021. Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition. arxiv:2106.06960 [cs.CV]Google Scholar
T. Dao, Thanh-Hai Tran, Thi-Lan Le, H. Vu, Viet-Tung Nguyen, Dang-Khoa Mac, Ngoc-Diep Do, and Thanh-Thuy Pham. 2016. Indoor navigation assistance system for visually impaired people using multimodal technologies. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) (2016), 1–6.Google ScholarCross Ref
Duc Phan Van Hoai, Huu-Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR) 24 (2021), 123–131.Google ScholarDigital Library
Van-Nam Hoang, Thanh-Huong Nguyen, Thi-Lan Le, Thi-Thanh Hai Tran, Tan-Phu Vuong, and Nicolas Vuillerme. 2015. Obstacle detection and warning for visually impaired people based on electrode matrix and mobile Kinect. In 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS). 54–59. https://doi.org/10.1109/NICS.2015.7302222Google ScholarCross Ref
Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651(2016).Google Scholar
Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, and Hongfu Liu. 2021. SelfDoc: Self-Supervised Document Representation Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5648–5656. https://doi.org/10.1109/CVPR46437.2021.00560Google Scholar
Yuliang* Liu, Hao* Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network. arXiv preprint arXiv:2002.10200(2020).Google Scholar
Vladimir Loginov. 2021. Why You Should Try the Real Data for the Scene Text Recognition. arxiv:2107.13938 [cs.CV]Google Scholar
S. Manoharan. 2019. A SMART IMAGE PROCESSING ALGORITHM FOR TEXT RECOGNITION, INFORMATION EXTRACTION AND VOCALIZATION FOR THE VISUALLY CHALLENGED.Google Scholar
Jie Mei, Aminul Islam, Abidalrahman Moh’d, Yajing Wu, and Evangelos Milios. 2018. Statistical learning for OCR error correction. Information Processing and Management 54, 6 (2018), 874–887. https://doi.org/10.1016/j.ipm.2018.06.001Google ScholarCross Ref
Hoai Viet Nguyen, Linh Bao Doan, Hoang Viet Trinh, Hoang Huy Phan, and Ta Minh Thanh. 2021. MC-OCR Challenge 2021: Towards Document Understanding for Unconstrained Mobile-Captured Vietnamese Receipts. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, 1–5.Google ScholarCross Ref
Nguyen Nguyen, Thu Nguyen, Vinh Tran, Triet Tran, Thanh Ngo, Thien Nguyen, and Minh Hoai. 2021. Dictionary-guided Scene Text Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarCross Ref
Khanh Nguyen-Trong. 2022. An End-to-End Method to Extract Information from Vietnamese ID Card Images. International Journal of Advanced Computer Science and Applications 13, 3(2022).Google ScholarCross Ref
Khanh Nguyen-Trong, Hoai Nam Vu, Ngon Nguyen Trung, and Cuong Pham. 2021. Gesture Recognition Using Wearable Sensors With Bi-Long Short-Term Memory Convolutional Neural Networks. IEEE Sensors Journal 21, 13 (2021), 15065–15079. https://doi.org/10.1109/JSEN.2021.3074642Google ScholarCross Ref
Bui Hai Phong, Thang Manh Hoang, and Thi-Lan Le. 2022. An end-to-end framework for the detection of mathematical expressions in scientific document images. Expert Syst. J. Knowl. Eng. 39, 1 (2022). https://doi.org/10.1111/exsy.12800Google Scholar
M. Rajesh, Bindhu K. Rajan, Ajay Roy, K. Almaria Thomas, Ancy Thomas, T. Bincy Tharakan, and C. Dinesh. 2017. Text recognition and face detection aid for visually impaired person using Raspberry PI. In 2017 International Conference on Circuit,Power and Computing Technologies (ICCPCT). 1–5. https://doi.org/10.1109/ICCPCT.2017.8074355Google ScholarCross Ref
K. Shanmugam and B. Vanathi. 2019. Hardcopy Text Recognition and Vocalization for Visually Impaired and Illiterates in Bilingual Language. Springer International Publishing, Cham, 151–163. https://doi.org/10.1007/978-3-030-02674-5_11Google Scholar
Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network. arxiv:1609.03605 [cs.CV]Google Scholar
Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, Thanh-Hung Nguyen, Phi Le Nguyen, and Thi-Lan Le. 2021. MC-OCR Challenge 2021: A Multi-modal Approach for Mobile-Captured Vietnamese Receipts Recognition. In RIVF International Conference on Computing and Communication Technologies, RIVF 2021, Hanoi, Vietnam, August 19-21, 2021. IEEE, 1–6. https://doi.org/10.1109/RIVF51545.2021.9642088Google Scholar
Dat Tran-Anh, Nam Hoai Vu, Khanh Nguyen-Trong, and Cuong Pham. 2022. Multi-task learning neural networks for breath sound detection and classification in pervasive healthcare. Pervasive and Mobile Computing 86 (2022), 101685. https://doi.org/10.1016/j.pmcj.2022.101685Google ScholarDigital Library
Gauri Vaidya, Ketki Vaidya, and Kishor Bhosale. 2020. Text Recognition System for Visually Impaired using Portable Camera. In 2020 International Conference on Convergence to Digital World - Quo Vadis (ICCDW). 1–4. https://doi.org/10.1109/ICCDW45521.2020.9318706Google ScholarCross Ref
Duc Phan Van Hoai, Huu Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition 24, 1-2(2021), 123–131. https://doi.org/10.1007/s10032-021-00363-7Google ScholarDigital Library
Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. 2020. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172Google ScholarDigital Library
Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020. PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks. (4 2020). http://arxiv.org/abs/2004.07464Google Scholar
Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene Text Detection and Recognition: Recent Advances and Future Trends. Front. Comput. Sci. 10, 1 (Feb. 2016), 19–36. https://doi.org/10.1007/s11704-015-4488-0Google ScholarDigital Library

Index Terms

Improving text recognition by combining visual and linguistic features of text
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object recognition
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification

Recommendations

An optical character recognition system for printed Telugu text

Telugu is one of the oldest and popular languages of India, spoken by more than 66 million people, especially in South India. Not much work has been reported on the development of optical character recognition (OCR) systems for Telugu text. Therefore, ...
Read More
Urdu Nasta'liq text recognition system based on multi-dimensional recurrent neural network and statistical features

Character recognition for cursive script like Arabic, handwritten English and French is a challenging task which becomes more complicated for Urdu Nasta'liq text due to complexity of this script over Arabic. Recurrent neural network (RNN) has proved ...
Read More
Degraded text recognition using visual and linguistic context
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology
December 2022
474 pages
ISBN:9781450397254
DOI:10.1145/3568562

Copyright © 2022 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 1 December 2022
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Dictionary guided
OCR
Vietnamese text recognition
deep learning
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate147of318submissions,46%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 1
  Total Citations
  View Citations
- 41
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Improving text recognition by combining visual and linguistic features of text

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

An optical character recognition system for printed Telugu text

Urdu Nasta'liq text recognition system based on multi-dimensional recurrent neural network and statistical features

Degraded text recognition using visual and linguistic context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Improving text recognition by combining visual and linguistic features of text

SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology

ABSTRACT

References

Cited By

Index Terms

Recommendations

An optical character recognition system for printed Telugu text

Urdu Nasta'liq text recognition system based on multi-dimensional recurrent neural network and statistical features

Degraded text recognition using visual and linguistic context

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media