ABSTRACT
Character recognition (CR) has been the subject of substantial research over the past half century and has now reached a level of development that is adequate to build applications driven by technology. Now, the fast expanding computer power makes it possible to execute the existing Optical Character Recognition (OCR) approaches and produces an increasing demand in a wide variety of emergent application domains that call for more advanced methodologies. Scanning the paper and entering the digitized image into a computer system is one of the quickest and easiest ways to save text information. After that, it will be saved on the computer, and if necessary, alterations can be made to it as well. However, recognizing the text in an image that has been recorded is a very difficult challenge to accomplish. As a result, The Tesseract method has been employed to extract text from images, simplifying the process of doing so.
- Andrew S Agbemenu, Jepthah Yankey, and Ernest O Addo. 2018. An automatic number plate recognition system using opencv and tesseract ocr engine. International Journal of Computer Applications 180, 43 (2018), 1–5.Google ScholarCross Ref
- Nafiz Arica and Fatos Yarman Vural. 2001. An overview of character recognition focused on off-line handwriting. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on 31 (06 2001), 216 – 233. https://doi.org/10.1109/5326.941845Google ScholarDigital Library
- Muskan Chawla, Rachna Jain, and Preeti Nagrath. 2020. Implementation of tesseract algorithm to extract text from different images. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC).Google ScholarCross Ref
- Ahmed Chowdhury, Ejaj Ahmed, Shameem Ahmed, Shohrab Hossain, and Chowdhury Rahman. 2002. Optical Character Recognition of Bangla Characters using Neural Network: A Better Approach.Google Scholar
- Shrey Dutta, Naveen Sankaran, K. Pramod Sankar, and C.V. Jawahar. 2012. Robust Recognition of Degraded Documents Using Character N-Grams. In 2012 10th IAPR International Workshop on Document Analysis Systems. 130–134. https://doi.org/10.1109/DAS.2012.76Google ScholarDigital Library
- Md. Imdadul Haque Emon, Khondoker Nazia Iqbal, Md Humaion Kabir Mehedi, Mohammed Julfikar Ali Mahbub, and Annajiat Alim Rasel. 2023. A Review of Optical Character Recognition (OCR) Techniques on Bengali Scripts. 85–94. https://doi.org/10.1007/978-3-031-25161-0_6Google ScholarCross Ref
- Lubna, Naveed Mufti, and Syed Afaq Ali Shah. 2021. Automatic Number Plate Recognition:A Detailed Survey of Relevant Algorithms. Sensors 21, 9 (2021). https://www.mdpi.com/1424-8220/21/9/3028Google Scholar
- Jamshed Memon, Maira Sami, and Rizwan Ahmed Khan. 2020. Handwritten Optical Character Recognition (OCR): A Comprehensive Systematic Literature Review (SLR). IEEE Access 8 (2020), 142642–142668.Google ScholarCross Ref
- Jeroen Ooms. 2023. tesseract: Open Source OCR Engine. https://docs.ropensci.org/tesseract/ (website) https://github.com/ropensci/tesseract (devel).Google Scholar
- Hisashi Saiga, Yasuhisa Nakamura, Yoshihiro Kitamura, and Toshiaki Morita. 1993. An OCR system for business cards. In Proceedings of 2nd International Conference on Document Analysis and Recognition (ICDAR’93). IEEE, 802–805.Google ScholarCross Ref
- R. Smith. 2007. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), Vol. 2. 629–633. https://doi.org/10.1109/ICDAR.2007.4376991Google ScholarCross Ref
- Ray Smith, Daria Antonova, and Dar-Shyang Lee. 2009. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. In MOCR ’09: Proceedings of the International Workshop on Multilingual OCR. http://doi.acm.org/10/1145/1577802.1577804Google ScholarDigital Library
- Dan Sporici, Elena Cușnir, and Costin-Anton Boiangiu. 2020. Improving the Accuracy of Tesseract 4.0 OCR Engine Using Convolution-Based Preprocessing. Symmetry 12 (05 2020), 715. https://doi.org/10.3390/sym12050715Google ScholarCross Ref
- Junqing Tang, Li Wan, Jennifer Schooling, Pengjun Zhao, Jun Chen, and Shufen Wei. 2022. Automatic number plate recognition (ANPR) in smart cities: A systematic review on technological advancements and application cases. Cities 129 (2022), 103833. https://doi.org/10.1016/j.cities.2022.103833Google ScholarCross Ref
- Zhenyao Zhao, Min Jiang, Shihui Guo, Zhenzhong Wang, Fei Chao, and Kay Chen Tan. 2020. Improving deep learning based optical character recognition via neural architecture search. In 2020 IEEE Congress on Evolutionary Computation (CEC). IEEE, 1–7.Google ScholarDigital Library
Index Terms
- Implementing the Tesseract Method for Information Extraction from Images
Recommendations
Adapting Tesseract for Complex Scripts: An Example for Urdu Nastalique
SBES '13: Proceedings of the 2013 27th Brazilian Symposium on Software EngineeringTesseract engine supports multilingual text recognition. However, the recognition of cursive scripts using Tesseract is a challenging task. In this paper, Tesseract engine is analyzed and modified for the recognition of Nastalique writing style for Urdu ...
Implementation of Optical Character Recognition using Tesseract with the Javanese Script Target in Android Application
AbstractRecognising characters from text have been a popular topic in the computer vision area. The application can benefit to many problems in the world. For example: recognising text in documents, classifying the text or scripts of documents, plate ...
MAPS: midline analysis and propagation of segmentation
ICVGIP '12: Proceedings of the Eighth Indian Conference on Computer Vision, Graphics and Image ProcessingScenic word images undergo degradations due to motion blur, uneven illumination, shadows and defocussing, which lead to difficulty in segmentation. As a result, the recognition results reported on the scenic word image datasets of ICDAR have been low. ...
Comments