Elsevier

Pattern Recognition

Volume 71, November 2017, Pages 78-93
Pattern Recognition

A multi-scale deep quad tree based feature extraction method for the recognition of isolated handwritten characters of popular indic scripts

https://doi.org/10.1016/j.patcog.2017.05.022Get rights and content

Highlights

  • A novel multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed.

  • A deep quad-tree based staggered prediction model has also been added with the model for recognition with SVM.

  • A Multiple level tree network is used increases the recognition rate as it votes through the softmax probabilities of all quadrants or decahexadrants than a single CNN.

  • The proposed techniques has been evaluated on 10 publicly available datasets of isolated handwritten characters or digits including MNIST.

  • Promising results have been achieved by the proposed system for all of the datasets.

Abstract

Recognition of handwritten characters is a challenging task. Variations in writing styles from one person to another, as well as for a single individual from time to time, make this task harder. Hence, identifying the local invariant patterns of a handwritten character or digit is very difficult. These challenges can be overcome by exploiting various script specific characteristics and training the OCR system based on these special traits. Finding ubiquitous invariant patterns and peculiarities, applicable for handwritten characters or digits of multiple scripts, is much more difficult. In the present work, a non-explicit feature based approach, more specifically, a multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose. A deep quad-tree based staggered prediction model has been proposed for faster character recognition. These denote the most significant contributions of the present work. The proposed methodology has been tested on 9 publicly available datasets of isolated handwritten characters or digits of Indic scripts. Promising results have been achieved by the proposed system for all of the datasets. A comparative analysis has also been performed against some of the contemporary OCR systems to prove the superiority of the proposed system. We have also evaluated our system on MNIST dataset and achieved a maximum recognition accuracy of 99.74%, without any data augmentation to the original dataset.

Introduction

Optical Character Recognition (OCR) can be defined as the process of automatically recognizing characters from an optically scanned or digitized page of handwritten or printed texts. Although challenging, it largely contributes towards the advancement of improving the interface between man and machine in many applications. The already difficult task of OCR, however gets more challenging in case of recognizing stylistically varying handwritten characters. Determining the local invariant patterns significant to a specific handwritten character or digit is a difficult job as handwriting style varies from one person to another. After years of research, from Nipkow's sequential scanner [1] to template-based [2] similarity matching technique, LeNet [3], to multi-pass hybrid [2] approach, OCR systems have evolved a lot. As shown in Fig. 1, feature based approaches towards OCR system can be classified into two major categories: spatial domain techniques [4] and transform domain techniques [5]. After features are extracted, sophisticated classification models are utilized for recognition of printed or handwritten character images.

Researchers working on OCR systems have proposed a wide array of features for automatically recognizing printed or handwritten characters. While most of these features are generic, some of them utilize script specific properties to improve the performance of the underlying classifier. Syntactic or formal grammar based features [6], moment based features [7], graph-theoretic approaches [8], shadow based features [9], gradient based features [10] etc. are some of the most popular examples. Contrary to explicit feature based approaches described above, where handcrafted features are explicitly designed and extracted from digitized pattern images, some researchers have proposed methods where features do not need to be explicitly designed and extracted. As raw pattern image or normalized images are fed into the system, the prediction model adjusts itself over multiple iterations to minimize the misclassification error. Artificial neural network [3], [11] based approaches, HMM or Markov-model based approaches [12], [13] are examples of such non-explicit feature (NeF) based OCR systems. Connection weights and statistically derived parameters play a significant role in those approaches. As a result of decades of vigorous research, significant developments have been made for the recognition handwritten characters or digits of Roman scripts [14], [15], some European languages e.g. Spanish, French etc. and few Asian languages e.g. Japanese [1], Chinese [16] etc. Developing an OCR system for the recognition of handwritten characters or digits of Indian scripts, however, is still at its infancy.

Despite of the progress on developing OCR systems for handwritten characters of Indian scripts in the past decade, a commercially successful, comprehensive approach towards recognizing handwritten characters and digits of multiple Indian scripts is yet to emerge. Although Pal et al. [17] have advocated a feature based approach for the recognition of handwritten digits belonging to six popular Indic scripts (shown in Fig. 2), the proposed system does not delve into the challenging job of recognizing handwritten characters of multiple Indic scripts for a single system. Also, the performance of the proposed system, in terms of prediction time per character has also not been thoroughly investigated. The motivation of the present work is to propose an approach towards developing a high-performance, ubiquitous OCR system which will work equally well for handwritten characters and/or digits belonging to multiple Indian scripts.

A NeF based approach towards handwritten character recognition has been taken in the present work. A multi-column multi-scale convolutional neural network (MMCNN) based architecture has been proposed for this purpose. A deep quad-tree based staggered prediction model has also been proposed for faster prediction time. The MMCNN based architecture and a deep quad tree based lazy prediction model denote some of the significant contributions of the present work. The proposed feature extraction technique has been tested on publicly available, benchmark datasets of handwritten characters or digits of five most popular [18] Indian scripts. Promising results have been achieved by the proposed system for all of the datasets. A comparative analysis has also been performed against some of the contemporary OCR systems to prove the superiority of the proposed system.

The rest of the paper is organized as follows: Section 2 presents a brief overview on NeF based approach towards OCR systems, Section 3 describes the present work, the datasets used in the experimental setup is presented in Section 4, experimental results are described in Section 5 and finally, a brief conclusion is made from the results.

Section snippets

A brief overview on non-explicit feature based approach towards OCR systems

Feature selection is an important step in any pattern recognition task. Selecting the best feature-set for a test image often proves to be a major factor in successfully identifying that image. Among different feature extraction techniques mentioned in contemporary literature, some are generic e.g. basic shape based primitive features [1], [2], gradient based features [4], [19], shadow features [20], [9], moment-based features [7], contour based features [9], [10] etc. which have been

The present work

As mentioned above a NeF based approach using MMCNN based architecture towards the recognition of isolated handwritten characters of popular Indian scripts has been proposed in the present work. The connection weights between the final layer and softmax classifier of each column of the architecture are used as implicit feature descriptors. Once the network converges and connection weights are learnt, the pattern image is forward propagated through the network and features are extracted between

Experimental results

As discussed earlier, the objectives of the present work are two-fold: (a) to propose a generalized feature extraction technique which would be equally effective for the recognition of handwritten characters and digits of significantly varied, complex Indic scripts and (b) to prove the superiority of the proposed method over its existing contemporaries. A multi-scale deep quad tree based ubiquitous feature extraction technique has been proposed for this purpose (discussed in Section 3).The

Conclusion

A non-explicit feature based approach towards a ubiquitous OCR system for the recognition of handwritten characters and/or digits of top five Indic scripts is proposed in the present work. A MMCNN based architecture has been utilized for feature extraction purposes. A mutli-scale convolutional sampling technique has been proposed for extracting more robust, invariant features from pattern images. The multi-column architecture ensures an effective system. The multi-scale convolutional sampling,

Acknowledgements

The authors are thankful to the Center for Microprocessor Application for Training Education and Research (CMATER) and Project on Storage Retrieval and Understanding of Video for Multimedia (SRUVM) of Computer Science and Engineering Department, Jadavpur University, for providing infrastructure facilities during progress of the work. The current work, reported here, has been partially funded by University with Potential for Excellence (UPE), Phase-II, UGC, Government of India.

Ritesh Sarkhel received his B.C.S.E degree from Jadavpur University in 2012. He worked as an R&D Engineer in Samsung Research Institute, Noida from 2012 to 2014. He is currently pursuing M.C.S.E degree from Jadavpur University. His areas of current research interest are OCR of handwritten text, optimization techniques and computer vision.

References (79)

  • A. Dutta et al.

    Bengali alpha-numeric character recognition using curvature features

    Pattern Recognit.

    (1993)
  • S. Roy et al.

    Handwritten isolated bangla compound character recognition: a new benchmark using a novel deep learning approach

    Pattern Recognit. Lett.

    (2017)
  • N. Das

    A statistical–topological feature combination for recognition of handwritten numerals

    Appl. Soft Comput.

    (2012)
  • C.-L. Liu et al.

    A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters

    Pattern Recognit.

    (2009)
  • T.T. O.D.Trier et al.

    Feature extraction methods for character recognition - a survey

    Pattern Recognit.

    (1996)
  • Y. LeCun et al.

    Word-level training of a handwritten word recognizer based on convolutional neural networks

  • R. Sarkhel et al.

    An enhanced harmony search method for bangla handwritten character recognition using region sampling

  • S. Wendling et al.

    Hadamard and haar transforms and their power spectrum in character recognition

  • H.-Y. Feng et al.

    Decomposition of polygons into simpler components: feature generation for syntactic pattern recognition

    IEEE Trans. Comput.

    (1975)
  • P.K. Singh, R. Sarkar, and M. Nasipuri, “A study of moment based features on handwritten digit recognition,” vol. 2016,...
  • S. Kahan et al.

    On the recognition of printed characters of any font and size

    Pattern Anal. Mach. Intell. IEEE Trans.

    (1987)
  • A. Roy et al.

    A new quad tree based feature set for recognition of handwritten bangla numerals

  • S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, and D.K. Basu, “An MLP based approach for recognition of handwritten...
  • A. de et al.

    The recognition of handwritten numeral strings using a two-stage HMM-based method

    Int. J. Doc. Anal. Recognit.

    (2003)
  • A. Vinciarelli et al.

    Offline recognition of unconstrained handwritten texts using HMMs and statistical language models

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2004)
  • S. Mori et al.

    Historical review of OCR research and development

    Proc. IEEE

    (1992)
  • G. Nagy

    Chinese character recognition: a twenty-five-year retrospective

  • U. Pal et al.

    Handwritten Numeral Recognition of Six Popular Indian Scripts

  • M. Encarta

    Languages spoken by more than 10 million people – Table – MSN encarta

    Arch. Orig.

    (2007)
  • H.B. Kekre et al.

    Devnagari handwritten character recognition using LBG vector quantization with gradient masks

  • A. Negi et al.

    An OCR system for Telugu

  • R. Bajaj et al.

    Devnagari numeral recognition by combining decision of multiple connectionist classifiers

    Sadhana

    (2002)
  • S.K. Parui et al.

    Online handwritten bangla character recognition using HMM 3. analysis of strokes in handwritten

    IEEE

    (2008)
  • A. Kundu et al.

    Recognition of handwritten word: first and second order hidden Markov model based approach

  • Y. Bengio

    Learning deep architectures for AI

    Found. Trends® Mach. Learn.

    (2009)
  • A. Ul-Hasan et al.

    Offline printed urdu nastaleeq script recognition with bidirectional LSTM networks

  • A. Ray et al.

    Text recognition using deep BLSTM networks

  • D. Cireşan et al.

    Multi-column deep neural networks for offline handwritten Chinese character classification

  • A. Pal et al.

    Recognition of online handwritten Bangla characters using hierarchical system with denoising autoencoders

  • Cited by (91)

    • Tamil Handwritten Character Recognition System using Statistical Algorithmic Approaches

      2023, Computer Speech and Language
      Citation Excerpt :

      Instead, a statistical theory is essential to ignore those portions of the characters. Quad tree-based location feature was very useful to solve this issue as it looks at the pixel availability and will not bother the structure shape (Sarkhel et al., 2017, Das et al., 2012) (Note: Quad tree will return the same results (tree feature) for few shapes, so this alone cannot be used to classify a character). These features are highly adapted to classify all characters in the Tamil language.

    View all citing articles on Scopus

    Ritesh Sarkhel received his B.C.S.E degree from Jadavpur University in 2012. He worked as an R&D Engineer in Samsung Research Institute, Noida from 2012 to 2014. He is currently pursuing M.C.S.E degree from Jadavpur University. His areas of current research interest are OCR of handwritten text, optimization techniques and computer vision.

    Nibaran Das received his B.Tech degree in Computer Science and Technology from Kalyani Govt. Engineering College under Kalyani University, in 2003. He received his M.C.S.E and Ph. D.(Engg.) degree from Jadavpur University, in 2005 and 2012 respectively. He joined Jadavpur University as a faculty member in 2006. His areas of current research interest are OCR of handwritten text, optimization techniques, Deep Learning and image processing. He has been an editor of Bengali monthly magazine “Computer Jagat” since 2005.

    Aritra Das received his B.Tech degree in Computer Science and Engineering from Narula Institute of Technology under Maulana Abul Kalam Azad University of Technology, formerly known as West Bengal University of Technology in 2015. He is currently pursuing Master in Computer Science and Engineering from Jadavpur University. His current area of research is Machine Learning, Artificial Intelligence, Deep Learning, Computer Vision.

    Mahantapas Kundu received his B.E.E, M.E.Tel.E and Ph.D. (Engg.) degrees from Jadavpur University, in 1983, 1985 and 1995, respectively. Prof. Kundu has been a faculty member of J.U since 1988. His areas of current research interest include pattern recognition, image processing, multimedia database, and artificial intelligence.

    Mita Nasipuri received her B.E.Tel.E., M.E.Tel.E., and Ph.D. (Engg.) degrees from Jadavpur University, in 1979, 1981 and 1990, respectively. Prof. Nasipuri has been a faculty member of J.U since 1987. Her current research interest includes image processing, pattern recognition, and multimedia systems. She is a senior member of the IEEE, U.S.A., Fellow of I.E (India) and W.B.A.S.T, Kolkata, India.

    View full text