Deep optical character recognition: a case of Pashto language

Shizza Zahoor; Saeeda Naz; Naila Habib Khan; Muhammad I. Razzak

doi:10.1117/1.JEI.29.2.023002

4 March 2020 Deep optical character recognition: a case of Pashto language

Shizza Zahoor, Saeeda Naz, Naila Habib Khan, Muhammad I. Razzak

Author Affiliations +

Journal of Electronic Imaging, Vol. 29, Issue 2, 023002 (March 2020). https://doi.org/10.1117/1.JEI.29.2.023002

Abstract

Over the past decades, text recognition technologies have focused immensely on noncursive isolated scripts. A text recognition system for the cursive Pashto script will serve as a great contribution, allowing the traditional, cultural, and educational Pashto literature to be converted into machine-readable form. We propose the use of deep learning architectures based on the transfer learning for the recognition of Pashto ligatures. For recognition analysis and evaluation, the ligature images in the dataset are preprocessed by data augmentation techniques, i.e., negatives, contours, and rotated to increase the variation of each sample and size of the original dataset. Rich feature representations are automatically extracted from the Pashto ligature images using deep convolution layers of the convolution neural network (CNN) architectures using fine-tuned approach. Pretrained CNN architectures: AlexNet, GoogleNet, and VGG (VGG-16 and VGG-19) are used for classification by feeding the extracted features to a fully connected layer and a softmax layer. The proposed deep transfer-based learning has achieved phenomenal recognition rates for Pashto ligatures on benchmark FAST-NU Pashto dataset. An accuracy of 97.24%, 97.46%, and 99.03% is achieved using AlexNext, GoogleNet, and VGGNet architectures, respectively.

Citation Download Citation

Shizza Zahoor, Saeeda Naz, Naila Habib Khan, and Muhammad I. Razzak "Deep optical character recognition: a case of Pashto language," Journal of Electronic Imaging 29(2), 023002 (4 March 2020). https://doi.org/10.1117/1.JEI.29.2.023002

Received: 30 September 2019; Accepted: 14 February 2020; Published: 4 March 2020

ACCESS THE FULL ARTICLE

INSTITUTIONAL
Select your institution to access the SPIE Digital Library.

SELECT YOUR INSTITUTION

PERSONAL
Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.

PERSONAL SIGN IN

No SPIE Account? Create one

PURCHASE THIS CONTENT

SUBSCRIBE TO DIGITAL LIBRARY

50 downloads per 1-year subscription

Members: $195

Non-members: $335 ADD TO CART

25 downloads per 1 - year subscription

Members: $145

Non-members: $250 ADD TO CART

PURCHASE SINGLE ARTICLE

Includes PDF, HTML & Video, when available