skip to main content
10.1145/3605423.3605431acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicctaConference Proceedingsconference-collections
research-article

Implementing the Tesseract Method for Information Extraction from Images

Published:20 August 2023Publication History

ABSTRACT

Character recognition (CR) has been the subject of substantial research over the past half century and has now reached a level of development that is adequate to build applications driven by technology. Now, the fast expanding computer power makes it possible to execute the existing Optical Character Recognition (OCR) approaches and produces an increasing demand in a wide variety of emergent application domains that call for more advanced methodologies. Scanning the paper and entering the digitized image into a computer system is one of the quickest and easiest ways to save text information. After that, it will be saved on the computer, and if necessary, alterations can be made to it as well. However, recognizing the text in an image that has been recorded is a very difficult challenge to accomplish. As a result, The Tesseract method has been employed to extract text from images, simplifying the process of doing so.

References

  1. Andrew S Agbemenu, Jepthah Yankey, and Ernest O Addo. 2018. An automatic number plate recognition system using opencv and tesseract ocr engine. International Journal of Computer Applications 180, 43 (2018), 1–5.Google ScholarGoogle ScholarCross RefCross Ref
  2. Nafiz Arica and Fatos Yarman Vural. 2001. An overview of character recognition focused on off-line handwriting. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 31 (06 2001), 216 – 233. https://doi.org/10.1109/5326.941845Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Muskan Chawla, Rachna Jain, and Preeti Nagrath. 2020. Implementation of tesseract algorithm to extract text from different images. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC).Google ScholarGoogle ScholarCross RefCross Ref
  4. Ahmed Chowdhury, Ejaj Ahmed, Shameem Ahmed, Shohrab Hossain, and Chowdhury Rahman. 2002. Optical Character Recognition of Bangla Characters using Neural Network: A Better Approach.Google ScholarGoogle Scholar
  5. Shrey Dutta, Naveen Sankaran, K. Pramod Sankar, and C.V. Jawahar. 2012. Robust Recognition of Degraded Documents Using Character N-Grams. In 2012 10th IAPR International Workshop on Document Analysis Systems. 130–134. https://doi.org/10.1109/DAS.2012.76Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Md. Imdadul Haque Emon, Khondoker Nazia Iqbal, Md Humaion Kabir Mehedi, Mohammed Julfikar Ali Mahbub, and Annajiat Alim Rasel. 2023. A Review of Optical Character Recognition (OCR) Techniques on Bengali Scripts. 85–94. https://doi.org/10.1007/978-3-031-25161-0_6Google ScholarGoogle ScholarCross RefCross Ref
  7. Lubna, Naveed Mufti, and Syed Afaq Ali Shah. 2021. Automatic Number Plate Recognition:A Detailed Survey of Relevant Algorithms. Sensors 21, 9 (2021). https://www.mdpi.com/1424-8220/21/9/3028Google ScholarGoogle Scholar
  8. Jamshed Memon, Maira Sami, and Rizwan Ahmed Khan. 2020. Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR). IEEE Access 8 (2020), 142642–142668.Google ScholarGoogle ScholarCross RefCross Ref
  9. Jeroen Ooms. 2023. tesseract: Open Source OCR Engine. https://docs.ropensci.org/tesseract/ (website) https://github.com/ropensci/tesseract (devel).Google ScholarGoogle Scholar
  10. Hisashi Saiga, Yasuhisa Nakamura, Yoshihiro Kitamura, and Toshiaki Morita. 1993. An OCR system for business cards. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93). IEEE, 802–805.Google ScholarGoogle ScholarCross RefCross Ref
  11. R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991Google ScholarGoogle ScholarCross RefCross Ref
  12. Ray Smith, Daria Antonova, and Dar-Shyang Lee. 2009. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. In MOCR ’09: Proceedings of the International Workshop on Multilingual OCR. http://doi.acm.org/10/1145/1577802.1577804Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dan Sporici, Elena Cușnir, and Costin-Anton Boiangiu. 2020. Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing. Symmetry 12 (05 2020), 715. https://doi.org/10.3390/sym12050715Google ScholarGoogle ScholarCross RefCross Ref
  14. Junqing Tang, Li Wan, Jennifer Schooling, Pengjun Zhao, Jun Chen, and Shufen Wei. 2022. Automatic number plate recognition (ANPR) in smart cities: A systematic review on technological advancements and application cases. Cities 129 (2022), 103833. https://doi.org/10.1016/j.cities.2022.103833Google ScholarGoogle ScholarCross RefCross Ref
  15. Zhenyao Zhao, Min Jiang, Shihui Guo, Zhenzhong Wang, Fei Chao, and Kay Chen Tan. 2020. Improving deep learning based optical character recognition via neural architecture search. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–7.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Implementing the Tesseract Method for Information Extraction from Images

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      ICCTA '23: Proceedings of the 2023 9th International Conference on Computer Technology Applications
      May 2023
      270 pages
      ISBN:9781450399579
      DOI:10.1145/3605423

      Copyright © 2023 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 20 August 2023

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed limited
    • Article Metrics

      • Downloads (Last 12 months)33
      • Downloads (Last 6 weeks)3

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format .

    View HTML Format