Challenges and Preprocessing Recommendations for MADCAT Dataset of Handwritten Arabic Documents | IEEE Conference Publication | IEEE Xplore

Challenges and Preprocessing Recommendations for MADCAT Dataset of Handwritten Arabic Documents


Abstract:

In this paper, we analyze the dataset often used in training and testing Arabic handwritten document recognition systems, the Multilingual Automatic Document Classificati...Show More

Abstract:

In this paper, we analyze the dataset often used in training and testing Arabic handwritten document recognition systems, the Multilingual Automatic Document Classification Analysis and Translation dataset (MADCAT). We report here the main challenges present in MADCAT that the preprocessing stage of any recognition algorithm faces and affect the performance of the systems that use it for training and testing. MADCAT is a representative dataset of Arabic handwritten documents and investigating its challenges helps to identify the requirements of the preprocessing stage. After presenting these challenges, we review the literature and recommend preprocessing algorithms suitable to preprocess this dataset for handwritten Arabic word recognition systems such as JU-OCR2.
Date of Conference: 13-15 October 2018
Date Added to IEEE Xplore: 03 February 2019
ISBN Information:
Conference Location: Beijing, China

Contact IEEE to Subscribe

References

References is not available for this document.