Elsevier

Pattern Recognition

Volume 48, Issue 4, April 2015, Pages 1225-1234
Pattern Recognition

Importance sampling based discriminative learning for large scale offline handwritten Chinese character recognition

https://doi.org/10.1016/j.patcog.2014.09.014Get rights and content

Highlights

  • Propose an importance sampling based discriminative learning framework for large scale classification problem.

  • Introduce rejection sampling, boosting algorithm and MCE to estimate sample importance weight.

  • Compare the methods on large scale character recognition problem and summarize them under the unify framework.

Abstract

The development of a discriminative learning framework based on importance sampling for large-scale classification tasks is reported in this paper. The framework involves the assignment of samples with different weights according to the sample importance weight function derived from the Bayesian classification rule. Three methods are used to calculate the sample importance weights for learning the modified quadratic discriminant function (MQDF). (1) Rejection sampling method. The method selects important samples as a training subset and trains different levels of MQDFs by focusing on different types of samples. (2) Boosting algorithm. The algorithm modifies the sample importance weights iteratively according to the recognition performance. (3) Minimum classification error (MCE) rule. The parameter of the importance weight function is estimated using the MCE rule. In general, the cursive samples are usually misclassified or prone to be misclassified by the MQDF learned under the maximum likelihood estimation (MLE) rule. The proposed importance sampling framework thereby makes the MQDF classifier focus more on cursive samples than on normal samples. Such a strategy allows the MQDF to achieve higher accuracy while maintaining lower computational complexity. Comprehensive experiments on three Chinese handwritten character datasets demonstrated that the proposed framework exhibits promising character recognition accuracy.

Introduction

With the maturity and in-depth study of pattern recognition technologies, research on handwritten Chinese character recognition has attracted increasing attention and interest. There are two milestones in the history of Chinese character recognition. The first milestone is the demonstration of the extraction of a directional element feature (DEF) [1], [2] from a binary image, and this achievement was subsequently extended to gray scale images [3], [4] later. The directional feature greatly improves the feature representation ability and is currently commonly used for character recognition. The second milestone is the application of the MQDF classifier [5], which significantly improves the character recognition accuracy. Many research results on the recognition of constrained offline handwritten Chinese characters have been reported [6], [7], [8], [9], [10]. For example, the highest recognition accuracy reported is 98.56% [7] for the HCL2000 dataset [11]. However, the recognition accuracy of cursive characters is relatively low, being less than 95% [12]. Therefore, many challenges must be overcome in the development of a framework for the recognition of handwritten Chinese characters [13].

The recognition of cursive handwritten Chinese characters is a challenging large scale classification problem. Due to the wide diversity of cursive Chinese characters, the training samples are usually insufficient for training. In addition, the diverse writing styles of cursive characters reduce the inter-class variances and increase the intra-class variances, thereby increasing the ambiguity of character classification.

There are two main types of classification approaches for character recognition: the generative method and discriminative learning. The generative method, e.g., MQDF, mainly models each class as a specific probability distribution. The training process of the generative model is exactly a parameter estimation problem. Maximum likelihood estimation (MLE) and Bayesian estimation are two typical and effective methods used to estimate the parameters. MLE regards the estimated parameter as a certain variable with a fixed value, which enables the model to generate observations with the maximum likelihood probability. For Bayesian estimation, the parameters are considered random variables. Due to its computational simplicity, MLE is widely used in large scale recognition problems. Duda [14] indicated that, in the absence of specific a priori knowledge, the estimation results of these two methods are similar. For cursive Chinese character recognition, the performance of the MQDF using the parameters estimated by MLE is far from satisfactory. The MQDF relies on the assumption that the features of each class satisfy a Gaussian distribution. However, such an assumption is poorly met by cursive characters. Furthermore, the MLE method does not consider discriminative information among different classes. Therefore, the MQDF based on the MLE method does not perform optimally. Discriminative learning is an effective means to improve the classification performance. However, its high computational complexity has limited its application in large scale classification problems. Therefore, discriminative learning for handwritten Chinese character recognition in which both the recognition accuracy and the computational complexity should be taken into consideration remains an open research problem.

To leverage the advantage of discriminative learning with low computational complexity, we propose an importance sampling based discriminative learning framework in which discriminative information is utilized in the MQDF training process. The basic concept is to assign higher weights to the misclassified (or prone to be misclassified) samples and lower weights to the correctly classified samples. Our preliminary idea and experimental results were partially reported in a previous report [15]. The theory and experimental results of our concept are extended and described in more detail in this paper.

The rest of this paper is organized as follows: The related works are briefly reviewed in Section 2. The MQDF is introduced in Section 3, and the importance sampling based discriminative learning framework is described in Section 4. Rejection sampling, boosting and MCE, which are used for estimating the sample importance weights, are introduced in 5 Rejection sampling strategy, 6 Sample importance weight estimated under the boosting algorithm, 7 Sample importance weight estimated under MCE, respectively. The experimental results and a summary of the proposed algorithms are presented in Section 8. Finally, the conclusion is presented in Section 9.

Section snippets

Related works

Discriminative learning for large scale classification is an important and challenging research topic in pattern recognition. Many works have contributed to improving the recognition accuracy of handwritten Chinese characters. Although MQDF remains the most widely used classifier for Chinese character recognition, it suffers from the non-Gaussian characteristics of the data distribution. To better represent the data distribution, the Gaussian mixture model (GMM) is adopted [16]. Theoretically,

MQDF

In this paper, MQDF based on MLE was used as the baseline classifier. MQDF is derived from the quadratic discriminant function (QDF), which is a Bayesian classifier that assumes that the samples of each class satisfy a Gaussian distribution. QDF can be expressed as the following equation:di(x)=(xμi)TΣi1(xμi)+log|Σi|

The mean vector μi and covariance matrix Σi are usually estimated by MLE. By applying eigenvalue decomposition on Σi, the QDF classifier can be rewritten as [5]di(x)=j=1n1λj[φijT(

Importance sampling based discriminative learning framework

The mean vector and covariance matrix are sufficient to describe a Gaussian distribution. Both of them can be estimated by the following form:E(ϕ(X))=xϕ(x)p(x)dx,where p(x) is a distribution function that is usually unknown or for which importance sampling is difficult to perform. The importance sampling method introduces a sampling function q(x) to sample from a weighted distribution [35]. The method is formulated asE(ϕ(X))=xϕ(x)p(x)q(x)π(x)q(x)dx=E[π(X)ϕ(X)],where π(X) is the sample

Rejection sampling strategy

The rejection sampling strategy is a particular case of importance sampling [37]. The rejection of a sample is determined by specific prerequisites. At the opposite end of rejection sampling is the selection of important samples for training. As described in Section 4, the generalized recognition confidence reflects the sample importance. The extended general recognition confidence is used to measure the samples. If a training sample is recognized with confidence Rij lower than Th, it will be

Sample importance weight estimated under the boosting algorithm

The boosting method is widely applied in face detection and binary classification tasks. The method provides high accuracy with fast speed and has been extended to multi-class classification problems. There are many versions of multi-class AdaBoost [41], [42], [43], [44], [45]. The weighting functions of these algorithms are mainly divided into two categories. The first category involves updating the sample weight according to whether a sample is recognized correctly, e.g., discrete AdaBoost

Sample importance weight estimated under MCE

In 5 Rejection sampling strategy, 6 Sample importance weight estimated under the boosting algorithm, the sample importance weights are estimated by the rejection sampling and the boosting methods, respectively. In this section, we directly estimate the parameter σ of the sample importance function under the MCE rule.

The determination of σ has a significant impact on the distribution of πw(R). As illustrated in Fig. 3, if the distribution of R is fixed, the sample importance weight is a function

Datasets and experiment setup

HCL2000 (Handwritten Character Library 2000) is a published dataset [11]. A regular training set contains 700 subsets, from xx001 to xx700. The testing set contains 300 subsets, from hh001 to hh300. Each subset includes 3755 level-1 simplified Chinese characters of GB2312-80. This dataset is widely used for recognition algorithm investigation. Some of the sample characters are illustrated in Fig. 5.

The THU-HCD dataset is collected by the Center for Intelligent Image and Document Information

Conclusion

This paper outlined our research on the importance of a sampling discriminative learning framework for large scale classification. Three sample importance weight estimation methods were proposed: rejection sampling, boosting and MCE. All of these methods employ MQDF as the baseline classifier and indirectly optimize the classifier parameters under an importance sampling based discriminative learning framework. These methods were tested using three Chinese handwriting datasets. The experimental

Conflict of interest statement

The authors declare that they have no conflict of interests.

Yanwei Wang received his B.S. degree in Electronic and Information Engineering from Jilin University in 2007 and Ph.D. degree in Electronic Engineering in 2013 from the TsingHua University. Now he does postdoctoral research in Tsinghua University. His current interests include offline Chinese character and text line recognition.

References (47)

  • S. Meng et al.

    Handwritten numeral recognition using gradient and curvature of gray scale image

    Pattern Recognit.

    (2002)
  • F. Kimura et al.

    Modified quadratic discriminant functions and its application to Chinese character recognition

    IEEE Trans. Pattern Anal. Mach. Intell.

    (1987)
  • H.-L. Liu, X.-Q. Ding, Handwritten character recognition using gradient feature and quadratic classifier with multiple...
  • X.-B. Liu, Y.-D. Jia, M. Tan, Geometrical statistical modeling of character structures for natural stroke extraction...
  • H.-G. Zhang, J. Yang, W.-H. Deng, J. Guo, Handwritten Chinese character recognition using local discriminate projection...
  • H.-G. Zhang, J. Guo, G. Chen, C.-G. Li, HCL2000 a large scale handwritten Chinese character database for handwritten...
  • F. Yin, Q.-F. Wang, X.-Y. Z, C.-L. Liu, ICDAR 2013 Chinese handwriting recognition competition, in: Proceedings of the...
  • C.-L. Liu, F. Yin, D.-H. Wang, Q.-F. Wang, Online and offline handwritten Chinese character recognition: benchmarking...
  • R.O. Duda et al.

    Pattern Classification

    (2001)
  • Y.-W. Wang, X.-Q. Ding, C.S. Liu, Discriminative learning based offline handwritten Chinese character recognition, in:...
  • T. Hamamura, B. Irie, T.Nishimoto, N.Ono, S. Sagayama, Concurrent optimization of context clustering and GMM for...
  • J Friedman et al.

    Additive logistic regression: a statistical view of boosting

    Ann. Stat.

    (2000)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • Cited by (3)

    • Tamil Handwritten Character Recognition System using Statistical Algorithmic Approaches

      2023, Computer Speech and Language
      Citation Excerpt :

      The multi-class SVM has been executed using python machine learning libraries in the One-versus-Rest form. Three frequently used classifiers in character recognition and for discrete features such as Modified Quadratic Discriminant Function (MQDF) (Wang et al., 2015, Liu and Suen, 2009), Multilayer Perceptron (MLP) (Liu and Suen, 2009, Khan et al., 2020) and SVM were tested with HP India datasets in the second phase. Hence the SVM has provided a better result (90.31%) than others (MQDF: 75.75%, MLP: 82.91%).

    • Compact MQDF classifiers using sparse coding for handwritten Chinese character recognition

      2018, Pattern Recognition
      Citation Excerpt :

      The performance of handwritten Chinese character recognition (HCCR) has been improved by many effective approaches [1–4].

    Yanwei Wang received his B.S. degree in Electronic and Information Engineering from Jilin University in 2007 and Ph.D. degree in Electronic Engineering in 2013 from the TsingHua University. Now he does postdoctoral research in Tsinghua University. His current interests include offline Chinese character and text line recognition.

    Qiang Fu received his B.S. degree in 2002 and Ph.D. degree in Electronic Engineering in 2008 from the TsingHua University. He has been working at Microsoft Research Asia as an Associate Researcher from 2008. His research interests include pattern recognition and data mining, log analysis for system performance diagnosis.

    Xiaoqing Ding (SM’07) received the B.E. degree from Tsinghua University, Beijing, China, in 1962. She is currently a Professor and Ph.D. Supervisor in the Department of Electronic Engineering, Tsinghua University. She is an IAPR fellow, IEEE fellow. Her research interests include pattern recognition, image processing, character recognition, biometric identification, computer vision and video surveillance, etc.

    Changsong Liu is an Associate Professor in the Department of Electronics Engineering at Tsinghua University. He received the B.S. degree in both Mechanics Engineering and Electronics Engineering in 1992, Master degree in Electronics Engineering in 1995, and Ph.D. degree in 2007 from Tsinghua University, China. His field of interests includes image processing, pattern recognition.

    View full text