Abstract:
In this paper, we analyze the dataset often used in training and testing Arabic handwritten document recognition systems, the Multilingual Automatic Document Classificati...Show MoreMetadata
Abstract:
In this paper, we analyze the dataset often used in training and testing Arabic handwritten document recognition systems, the Multilingual Automatic Document Classification Analysis and Translation dataset (MADCAT). We report here the main challenges present in MADCAT that the preprocessing stage of any recognition algorithm faces and affect the performance of the systems that use it for training and testing. MADCAT is a representative dataset of Arabic handwritten documents and investigating its challenges helps to identify the requirements of the preprocessing stage. After presenting these challenges, we review the literature and recommend preprocessing algorithms suitable to preprocess this dataset for handwritten Arabic word recognition systems such as JU-OCR2.
Published in: 2018 11th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI)
Date of Conference: 13-15 October 2018
Date Added to IEEE Xplore: 03 February 2019
ISBN Information: