Abstract:
Optical Character Recognition (OCR) is the process of extracting the texts from the images by means of some special programs and transferring them to the computer environ...Show MoreMetadata
Abstract:
Optical Character Recognition (OCR) is the process of extracting the texts from the images by means of some special programs and transferring them to the computer environment. OCR quality directly affects the quality of most natural language processing processes. Many applications such as text classification, information extraction, text summarization with texts extracted from images are used in daily life. Therefore, detecting and correcting incorrectly translated texts after OCR is a topic that researchers are working on with many methods today. In this study, it is aimed to apply and observe the results on the dataset presented in the International Conference on Document Analysis and Recognition (ICDAR) 2019 OCR Post Error Detection and Correction competition, using the latest neural machine translation methods to find and correct post-OCR text errors.
Date of Conference: 15-18 May 2022
Date Added to IEEE Xplore: 29 August 2022
ISBN Information:
Print on Demand(PoD) ISSN: 2165-0608