skip to main content
10.1145/3568562.3568624acmotherconferencesArticle/Chapter ViewAbstractPublication PagessoictConference Proceedingsconference-collections
research-article

Improving text recognition by combining visual and linguistic features of text

Published:01 December 2022Publication History

ABSTRACT

While being studied for several decades, Optical Character Recognition (OCR) has still been attracting considerable attention from researchers. Previous studies tend to focus on visual features of optical texts, such as texture, shape, and color to build OCR models. However, linguistic features, an important factor for OCR, has not been extensively investigated, especially for Vietnamese-OCR scanned documents. Therefore, we introduce a method to improve the performance of Vietnamese OCR by combining both visual and linguistic features of the optical text. The proposed method consists of (i) a domain-specific dictionary and (ii) a modified natural language processing model termed ABCNet, employed at the training and inference step, to determine the best candidate for the visual appearance of the text. Moreover, our method can easily be integrated with existing OCR methods to further increase their performance. Experimental results on a newly collected dataset show that the proposed method achieves an accuracy of 83.61% and a F1 score of 84.1%.

References

  1. Srikar Appalaraju, Bhavan Jasani, Bhargava Urala Kota, Yusheng Xie, and R. Manmatha. 2021. DocFormer: End-to-End Transformer for Document Understanding. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 973–983. https://doi.org/10.1109/ICCV48922.2021.00103Google ScholarGoogle Scholar
  2. Dzmitry Bahdanau, Kyung Hyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings(2015), 1–15. arxiv:1409.0473Google ScholarGoogle Scholar
  3. Xiaoxue Chen, Lianwen Jin, Yuanzhi Zhu, Canjie Luo, and Tianwei Wang. 2020. Text Recognition in the Wild: A Survey. arxiv:2005.03492 [cs.CV]Google ScholarGoogle Scholar
  4. Mengmeng Cui, Wei Wang, Jinjin Zhang, and Liang Wang. 2021. Representation and Correlation Enhanced Encoder-Decoder Framework for Scene Text Recognition. arxiv:2106.06960 [cs.CV]Google ScholarGoogle Scholar
  5. T. Dao, Thanh-Hai Tran, Thi-Lan Le, H. Vu, Viet-Tung Nguyen, Dang-Khoa Mac, Ngoc-Diep Do, and Thanh-Thuy Pham. 2016. Indoor navigation assistance system for visually impaired people using multimodal technologies. 2016 14th International Conference on Control, Automation, Robotics and Vision (ICARCV) (2016), 1–6.Google ScholarGoogle ScholarCross RefCross Ref
  6. Duc Phan Van Hoai, Huu-Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition (IJDAR) 24 (2021), 123–131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Van-Nam Hoang, Thanh-Huong Nguyen, Thi-Lan Le, Thi-Thanh Hai Tran, Tan-Phu Vuong, and Nicolas Vuillerme. 2015. Obstacle detection and warning for visually impaired people based on electrode matrix and mobile Kinect. In 2015 2nd National Foundation for Science and Technology Development Conference on Information and Computer Science (NICS). 54–59. https://doi.org/10.1109/NICS.2015.7302222Google ScholarGoogle ScholarCross RefCross Ref
  8. Armand Joulin, Edouard Grave, Piotr Bojanowski, Matthijs Douze, Hérve Jégou, and Tomas Mikolov. 2016. FastText.zip: Compressing text classification models. arXiv preprint arXiv:1612.03651(2016).Google ScholarGoogle Scholar
  9. Peizhao Li, Jiuxiang Gu, Jason Kuen, Vlad I. Morariu, Handong Zhao, Rajiv Jain, Varun Manjunatha, and Hongfu Liu. 2021. SelfDoc: Self-Supervised Document Representation Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 5648–5656. https://doi.org/10.1109/CVPR46437.2021.00560Google ScholarGoogle Scholar
  10. Yuliang* Liu, Hao* Chen, Chunhua Shen, Tong He, Lianwen Jin, and Liangwei Wang. 2020. ABCNet: Real-time Scene Text Spotting with Adaptive Bezier-Curve Network. arXiv preprint arXiv:2002.10200(2020).Google ScholarGoogle Scholar
  11. Vladimir Loginov. 2021. Why You Should Try the Real Data for the Scene Text Recognition. arxiv:2107.13938 [cs.CV]Google ScholarGoogle Scholar
  12. S. Manoharan. 2019. A SMART IMAGE PROCESSING ALGORITHM FOR TEXT RECOGNITION, INFORMATION EXTRACTION AND VOCALIZATION FOR THE VISUALLY CHALLENGED.Google ScholarGoogle Scholar
  13. Jie Mei, Aminul Islam, Abidalrahman Moh’d, Yajing Wu, and Evangelos Milios. 2018. Statistical learning for OCR error correction. Information Processing and Management 54, 6 (2018), 874–887. https://doi.org/10.1016/j.ipm.2018.06.001Google ScholarGoogle ScholarCross RefCross Ref
  14. Hoai Viet Nguyen, Linh Bao Doan, Hoang Viet Trinh, Hoang Huy Phan, and Ta Minh Thanh. 2021. MC-OCR Challenge 2021: Towards Document Understanding for Unconstrained Mobile-Captured Vietnamese Receipts. In 2021 RIVF International Conference on Computing and Communication Technologies (RIVF). IEEE, 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  15. Nguyen Nguyen, Thu Nguyen, Vinh Tran, Triet Tran, Thanh Ngo, Thien Nguyen, and Minh Hoai. 2021. Dictionary-guided Scene Text Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle ScholarCross RefCross Ref
  16. Khanh Nguyen-Trong. 2022. An End-to-End Method to Extract Information from Vietnamese ID Card Images. International Journal of Advanced Computer Science and Applications 13, 3(2022).Google ScholarGoogle ScholarCross RefCross Ref
  17. Khanh Nguyen-Trong, Hoai Nam Vu, Ngon Nguyen Trung, and Cuong Pham. 2021. Gesture Recognition Using Wearable Sensors With Bi-Long Short-Term Memory Convolutional Neural Networks. IEEE Sensors Journal 21, 13 (2021), 15065–15079. https://doi.org/10.1109/JSEN.2021.3074642Google ScholarGoogle ScholarCross RefCross Ref
  18. Bui Hai Phong, Thang Manh Hoang, and Thi-Lan Le. 2022. An end-to-end framework for the detection of mathematical expressions in scientific document images. Expert Syst. J. Knowl. Eng. 39, 1 (2022). https://doi.org/10.1111/exsy.12800Google ScholarGoogle Scholar
  19. M. Rajesh, Bindhu K. Rajan, Ajay Roy, K. Almaria Thomas, Ancy Thomas, T. Bincy Tharakan, and C. Dinesh. 2017. Text recognition and face detection aid for visually impaired person using Raspberry PI. In 2017 International Conference on Circuit,Power and Computing Technologies (ICCPCT). 1–5. https://doi.org/10.1109/ICCPCT.2017.8074355Google ScholarGoogle ScholarCross RefCross Ref
  20. K. Shanmugam and B. Vanathi. 2019. Hardcopy Text Recognition and Vocalization for Visually Impaired and Illiterates in Bilingual Language. Springer International Publishing, Cham, 151–163. https://doi.org/10.1007/978-3-030-02674-5_11Google ScholarGoogle Scholar
  21. Zhi Tian, Weilin Huang, Tong He, Pan He, and Yu Qiao. 2016. Detecting Text in Natural Image with Connectionist Text Proposal Network. arxiv:1609.03605 [cs.CV]Google ScholarGoogle Scholar
  22. Bao Hieu Tran, Duc Viet Hoang, Nguyen Manh Hiep, Pham Ngoc Bao Anh, Hoang Gia Bao, Nguyen Duc Anh, Bui Hai Phong, Thanh-Hung Nguyen, Phi Le Nguyen, and Thi-Lan Le. 2021. MC-OCR Challenge 2021: A Multi-modal Approach for Mobile-Captured Vietnamese Receipts Recognition. In RIVF International Conference on Computing and Communication Technologies, RIVF 2021, Hanoi, Vietnam, August 19-21, 2021. IEEE, 1–6. https://doi.org/10.1109/RIVF51545.2021.9642088Google ScholarGoogle Scholar
  23. Dat Tran-Anh, Nam Hoai Vu, Khanh Nguyen-Trong, and Cuong Pham. 2022. Multi-task learning neural networks for breath sound detection and classification in pervasive healthcare. Pervasive and Mobile Computing 86 (2022), 101685. https://doi.org/10.1016/j.pmcj.2022.101685Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gauri Vaidya, Ketki Vaidya, and Kishor Bhosale. 2020. Text Recognition System for Visually Impaired using Portable Camera. In 2020 International Conference on Convergence to Digital World - Quo Vadis (ICCDW). 1–4. https://doi.org/10.1109/ICCDW45521.2020.9318706Google ScholarGoogle ScholarCross RefCross Ref
  25. Duc Phan Van Hoai, Huu Thanh Duong, and Vinh Truong Hoang. 2021. Text recognition for Vietnamese identity card based on deep features network. International Journal on Document Analysis and Recognition 24, 1-2(2021), 123–131. https://doi.org/10.1007/s10032-021-00363-7Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, and Ming Zhou. 2020. LayoutLM: Pre-training of Text and Layout for Document Image Understanding. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1192–1200. https://doi.org/10.1145/3394486.3403172Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Wenwen Yu, Ning Lu, Xianbiao Qi, Ping Gong, and Rong Xiao. 2020. PICK: Processing Key Information Extraction from Documents using Improved Graph Learning-Convolutional Networks. (4 2020). http://arxiv.org/abs/2004.07464Google ScholarGoogle Scholar
  28. Yingying Zhu, Cong Yao, and Xiang Bai. 2016. Scene Text Detection and Recognition: Recent Advances and Future Trends. Front. Comput. Sci. 10, 1 (Feb. 2016), 19–36. https://doi.org/10.1007/s11704-015-4488-0Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Improving text recognition by combining visual and linguistic features of text

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Other conferences
        SoICT '22: Proceedings of the 11th International Symposium on Information and Communication Technology
        December 2022
        474 pages
        ISBN:9781450397254
        DOI:10.1145/3568562

        Copyright © 2022 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 1 December 2022

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed limited

        Acceptance Rates

        Overall Acceptance Rate147of318submissions,46%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format