Importance sampling based discriminative learning for large scale offline handwritten Chinese character recognition

doi:10.1016/j.patcog.2014.09.014

Pattern Recognition

Volume 48, Issue 4, April 2015, Pages 1225-1234

https://doi.org/10.1016/j.patcog.2014.09.014 Get rights and content

Highlights

•
Propose an importance sampling based discriminative learning framework for large scale classification problem.
•
Introduce rejection sampling, boosting algorithm and MCE to estimate sample importance weight.
•
Compare the methods on large scale character recognition problem and summarize them under the unify framework.

Abstract

The development of a discriminative learning framework based on importance sampling for large-scale classification tasks is reported in this paper. The framework involves the assignment of samples with different weights according to the sample importance weight function derived from the Bayesian classification rule. Three methods are used to calculate the sample importance weights for learning the modified quadratic discriminant function (MQDF). (1) Rejection sampling method. The method selects important samples as a training subset and trains different levels of MQDFs by focusing on different types of samples. (2) Boosting algorithm. The algorithm modifies the sample importance weights iteratively according to the recognition performance. (3) Minimum classification error (MCE) rule. The parameter of the importance weight function is estimated using the MCE rule. In general, the cursive samples are usually misclassified or prone to be misclassified by the MQDF learned under the maximum likelihood estimation (MLE) rule. The proposed importance sampling framework thereby makes the MQDF classifier focus more on cursive samples than on normal samples. Such a strategy allows the MQDF to achieve higher accuracy while maintaining lower computational complexity. Comprehensive experiments on three Chinese handwritten character datasets demonstrated that the proposed framework exhibits promising character recognition accuracy.

Introduction

With the maturity and in-depth study of pattern recognition technologies, research on handwritten Chinese character recognition has attracted increasing attention and interest. There are two milestones in the history of Chinese character recognition. The first milestone is the demonstration of the extraction of a directional element feature (DEF) [1], [2] from a binary image, and this achievement was subsequently extended to gray scale images [3], [4] later. The directional feature greatly improves the feature representation ability and is currently commonly used for character recognition. The second milestone is the application of the MQDF classifier [5], which significantly improves the character recognition accuracy. Many research results on the recognition of constrained offline handwritten Chinese characters have been reported [6], [7], [8], [9], [10]. For example, the highest recognition accuracy reported is 98.56% [7] for the HCL2000 dataset [11]. However, the recognition accuracy of cursive characters is relatively low, being less than 95% [12]. Therefore, many challenges must be overcome in the development of a framework for the recognition of handwritten Chinese characters [13].

The recognition of cursive handwritten Chinese characters is a challenging large scale classification problem. Due to the wide diversity of cursive Chinese characters, the training samples are usually insufficient for training. In addition, the diverse writing styles of cursive characters reduce the inter-class variances and increase the intra-class variances, thereby increasing the ambiguity of character classification.

There are two main types of classification approaches for character recognition: the generative method and discriminative learning. The generative method, e.g., MQDF, mainly models each class as a specific probability distribution. The training process of the generative model is exactly a parameter estimation problem. Maximum likelihood estimation (MLE) and Bayesian estimation are two typical and effective methods used to estimate the parameters. MLE regards the estimated parameter as a certain variable with a fixed value, which enables the model to generate observations with the maximum likelihood probability. For Bayesian estimation, the parameters are considered random variables. Due to its computational simplicity, MLE is widely used in large scale recognition problems. Duda [14] indicated that, in the absence of specific a priori knowledge, the estimation results of these two methods are similar. For cursive Chinese character recognition, the performance of the MQDF using the parameters estimated by MLE is far from satisfactory. The MQDF relies on the assumption that the features of each class satisfy a Gaussian distribution. However, such an assumption is poorly met by cursive characters. Furthermore, the MLE method does not consider discriminative information among different classes. Therefore, the MQDF based on the MLE method does not perform optimally. Discriminative learning is an effective means to improve the classification performance. However, its high computational complexity has limited its application in large scale classification problems. Therefore, discriminative learning for handwritten Chinese character recognition in which both the recognition accuracy and the computational complexity should be taken into consideration remains an open research problem.

To leverage the advantage of discriminative learning with low computational complexity, we propose an importance sampling based discriminative learning framework in which discriminative information is utilized in the MQDF training process. The basic concept is to assign higher weights to the misclassified (or prone to be misclassified) samples and lower weights to the correctly classified samples. Our preliminary idea and experimental results were partially reported in a previous report [15]. The theory and experimental results of our concept are extended and described in more detail in this paper.

The rest of this paper is organized as follows: The related works are briefly reviewed in Section 2. The MQDF is introduced in Section 3, and the importance sampling based discriminative learning framework is described in Section 4. Rejection sampling, boosting and MCE, which are used for estimating the sample importance weights, are introduced in 5 Rejection sampling strategy, 6 Sample importance weight estimated under the boosting algorithm, 7 Sample importance weight estimated under MCE, respectively. The experimental results and a summary of the proposed algorithms are presented in Section 8. Finally, the conclusion is presented in Section 9.

Section snippets

Related works

Discriminative learning for large scale classification is an important and challenging research topic in pattern recognition. Many works have contributed to improving the recognition accuracy of handwritten Chinese characters. Although MQDF remains the most widely used classifier for Chinese character recognition, it suffers from the non-Gaussian characteristics of the data distribution. To better represent the data distribution, the Gaussian mixture model (GMM) is adopted [16]. Theoretically,

MQDF

In this paper, MQDF based on MLE was used as the baseline classifier. MQDF is derived from the quadratic discriminant function (QDF), which is a Bayesian classifier that assumes that the samples of each class satisfy a Gaussian distribution. QDF can be expressed as the following equation: $d_{i} (x) = {(x - μ_{i})}^{T} Σ_{i}^{- 1} (x - μ_{i}) + \log | Σ_{i} |$

The mean vector $μ_{i}$ and covariance matrix $Σ_{i}$ are usually estimated by MLE. By applying eigenvalue decomposition on $Σ_{i}$ , the QDF classifier can be rewritten as [5] $d_{i} (x) = \sum_{j = 1}^{n} \frac{1}{λ_{j}} [φ_{i j}^{T} ($

Importance sampling based discriminative learning framework

The mean vector and covariance matrix are sufficient to describe a Gaussian distribution. Both of them can be estimated by the following form: $E (ϕ (X)) = \int_{x \in ℝ} ϕ (x) p (x) d x,$ where $p (x)$ is a distribution function that is usually unknown or for which importance sampling is difficult to perform. The importance sampling method introduces a sampling function $q (x)$ to sample from a weighted distribution [35]. The method is formulated as $E (ϕ (X)) = \int_{x \in ℝ} ϕ (x) \underset{π (x)}{\underset{︸}{\frac{p (x)}{q (x)}}} q (x) d x = E [π (X) ϕ (X)],$ where $π (X)$ is the sample

Rejection sampling strategy

The rejection sampling strategy is a particular case of importance sampling [37]. The rejection of a sample is determined by specific prerequisites. At the opposite end of rejection sampling is the selection of important samples for training. As described in Section 4, the generalized recognition confidence reflects the sample importance. The extended general recognition confidence is used to measure the samples. If a training sample is recognized with confidence $R_{i j}$ lower than $T h$ , it will be

Sample importance weight estimated under the boosting algorithm

The boosting method is widely applied in face detection and binary classification tasks. The method provides high accuracy with fast speed and has been extended to multi-class classification problems. There are many versions of multi-class AdaBoost [41], [42], [43], [44], [45]. The weighting functions of these algorithms are mainly divided into two categories. The first category involves updating the sample weight according to whether a sample is recognized correctly, e.g., discrete AdaBoost

Sample importance weight estimated under MCE

In 5 Rejection sampling strategy, 6 Sample importance weight estimated under the boosting algorithm, the sample importance weights are estimated by the rejection sampling and the boosting methods, respectively. In this section, we directly estimate the parameter σ of the sample importance function under the MCE rule.

The determination of $σ$ has a significant impact on the distribution of $π_{w} (R)$ . As illustrated in Fig. 3, if the distribution of $R$ is fixed, the sample importance weight is a function

Datasets and experiment setup

HCL2000 (Handwritten Character Library 2000) is a published dataset [11]. A regular training set contains 700 subsets, from xx001 to xx700. The testing set contains 300 subsets, from hh001 to hh300. Each subset includes 3755 level-1 simplified Chinese characters of GB2312-80. This dataset is widely used for recognition algorithm investigation. Some of the sample characters are illustrated in Fig. 5.

The THU-HCD dataset is collected by the Center for Intelligent Image and Document Information

Conclusion

This paper outlined our research on the importance of a sampling discriminative learning framework for large scale classification. Three sample importance weight estimation methods were proposed: rejection sampling, boosting and MCE. All of these methods employ MQDF as the baseline classifier and indirectly optimize the classifier parameters under an importance sampling based discriminative learning framework. These methods were tested using three Chinese handwriting datasets. The experimental

Conflict of interest statement

The authors declare that they have no conflict of interests.

Yanwei Wang received his B.S. degree in Electronic and Information Engineering from Jilin University in 2007 and Ph.D. degree in Electronic Engineering in 2013 from the TsingHua University. Now he does postdoctoral research in Tsinghua University. His current interests include offline Chinese character and text line recognition.

References (47)

F. Kimura et al.
Improvement of handwritten Japanese character recognition using weighted direction code histogram
Pattern Recognit.
(1997)
C.-L. Liu et al.
Handwritten digit recognition: investigation of normalization and feature extraction techniques
Pattern Recognit.
(2004)
T.-F. Gao et al.
High accuracy handwritten Chinese character recognition using LDA-based compound distances
Pattern Recognit.
(2008)
T. Long et al.
Building compact MQDF classifier for large character set recognition by subspace distribution sharing
Pattern Recognit.
(2008)
J.X. Dong et al.
An improved handwritten Chinese character recognition system using support vector machine
Pattern Recognit. Lett.
(2005)
X.-F. Lin et al.
Adaptive confidence transform based classifier combination for Chinese character recognition
Pattern Recognit. Lett.
(1998)
Y.G. Chen.
Another look at rejection sampling through importance sampling
Stat. Probab. Lett.
(2005)
C. Sima et al.
The peaking phenomenon in the presence of feature-selection
Pattern Recognit. Lett.
(2008)
C.-L. Liu et al.
Online and offline handwritten Chinese character recognition: benchmarking on new databases
Pattern Recognit.
(2013)
N. Kato et al.
A handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance
IEEE Trans. Pattern Anal. Mach. Intell.
(1999)

S. Meng et al.

Handwritten numeral recognition using gradient and curvature of gray scale image

Pattern Recognit.

(2002)

F. Kimura et al.

Modified quadratic discriminant functions and its application to Chinese character recognition

IEEE Trans. Pattern Anal. Mach. Intell.

(1987)

H.-L. Liu, X.-Q. Ding, Handwritten character recognition using gradient feature and quadratic classifier with multiple...

X.-B. Liu, Y.-D. Jia, M. Tan, Geometrical statistical modeling of character structures for natural stroke extraction...

H.-G. Zhang, J. Yang, W.-H. Deng, J. Guo, Handwritten Chinese character recognition using local discriminate projection...

H.-G. Zhang, J. Guo, G. Chen, C.-G. Li, HCL2000 a large scale handwritten Chinese character database for handwritten...

F. Yin, Q.-F. Wang, X.-Y. Z, C.-L. Liu, ICDAR 2013 Chinese handwriting recognition competition, in: Proceedings of the...

C.-L. Liu, F. Yin, D.-H. Wang, Q.-F. Wang, Online and offline handwritten Chinese character recognition: benchmarking...

R.O. Duda et al.

Pattern Classification

(2001)

Y.-W. Wang, X.-Q. Ding, C.S. Liu, Discriminative learning based offline handwritten Chinese character recognition, in:...

T. Hamamura, B. Irie, T.Nishimoto, N.Ono, S. Sagayama, Concurrent optimization of context clustering and GMM for...

J Friedman et al.

Additive logistic regression: a statistical view of boosting

Ann. Stat.

(2000)

C. Cortes et al.

Support-vector networks

Mach. Learn.

(1995)

Cited by (3)

Tamil Handwritten Character Recognition System using Statistical Algorithmic Approaches
2023, Computer Speech and Language
Citation Excerpt :
The multi-class SVM has been executed using python machine learning libraries in the One-versus-Rest form. Three frequently used classifiers in character recognition and for discrete features such as Modified Quadratic Discriminant Function (MQDF) (Wang et al., 2015, Liu and Suen, 2009), Multilayer Perceptron (MLP) (Liu and Suen, 2009, Khan et al., 2020) and SVM were tested with HP India datasets in the second phase. Hence the SVM has provided a better result (90.31%) than others (MQDF: 75.75%, MLP: 82.91%).
This framework gives a detailed research on recognizing Tamil handwritten characters using locational and directional approaches embedded with different combinations of zone and quad methodologies. Tamil language has 247 character classes and is widely spoken by the people in India (Tamil Nadu), Malaysia, Singapore, Sri Lanka and so on. For considering the large character sets with their general and handwritten complexities, the two-stage feature extraction process has been experimented with to represent the character's structure. In the initial stage, the character's image is divided into nine equal zones and the structural features were extracted from each zone by the directional algorithmic approach, which denotes unique shape possibilities represented in zone divisions. A classification test has been performed to identify characters in this stage, but a structural portion of handwritten characters like unwanted loops and curves leads to negative results. Hence, locational features have been introduced to identify the position of structures. Each zone is subdivided into four quads further and the pixel availability has been taken as features from the quads to provide the solution for unnecessary portions and loops. With directional features taken from upper (3 columns × 1 row) and lower zones (3 columns × 1 row), corresponding location features have been added up for labeling a unique shape. Finally, to classify the characters, the directional features taken from middle zones (3 columns × 1 row) and their respective locational features have been added with labeled shapes of upper and lower zones. A suitable machine learning algorithm has been chosen for classifying the character classes. HP-Lab-India dataset and two different handwritten documents collected from the people of Tamil Nadu, India, have been tested by these approaches. This experimental research shows significant improvement in recognizing accurate characters. The final results of this approach have created a benchmark for the recognition of handwritten Tamil characters.
Compact MQDF classifiers using sparse coding for handwritten Chinese character recognition
2018, Pattern Recognition
Citation Excerpt :
The performance of handwritten Chinese character recognition (HCCR) has been improved by many effective approaches [1–4].
The modified quadratic discriminant function (MQDF) is an effective classifier for handwritten Chinese character recognition (HCCR). However, it suffers from high memory requirement for the storage of its parameters, which makes it impractical to be embedded in memory limited hand-held devices. In this paper, we explore the applicability of sparse coding to build compact MQDF classifiers. To be specific, we use sparse coding to compact the parameters of MQDF. Two methods of sparse coding, viz., the maximum likelihood-based method and the K-SVD method, are adopted to build two compact MQDF classifiers, namely, MQDF-ML classifier and MQDF-KSVD classifier. Furthermore, we learn multiple dictionaries rather than single dictionary for sparse coding, because the multiple dictionary learning is capable of not only greatly reducing the computational complexity, but also alleviating the degradation of recognition accuracy, compared to the single dictionary learning. Experiments and comparison with the existing method have demonstrated the effectiveness of our proposed method for the issue of unconstrained handwritten Chinese character recognition.
Review of offline text independent writer identification techniques
2016, Jurnal Teknologi

Qiang Fu received his B.S. degree in 2002 and Ph.D. degree in Electronic Engineering in 2008 from the TsingHua University. He has been working at Microsoft Research Asia as an Associate Researcher from 2008. His research interests include pattern recognition and data mining, log analysis for system performance diagnosis.

Xiaoqing Ding (SM’07) received the B.E. degree from Tsinghua University, Beijing, China, in 1962. She is currently a Professor and Ph.D. Supervisor in the Department of Electronic Engineering, Tsinghua University. She is an IAPR fellow, IEEE fellow. Her research interests include pattern recognition, image processing, character recognition, biometric identification, computer vision and video surveillance, etc.

Changsong Liu is an Associate Professor in the Department of Electronics Engineering at Tsinghua University. He received the B.S. degree in both Mechanics Engineering and Electronics Engineering in 1992, Master degree in Electronics Engineering in 1995, and Ph.D. degree in 2007 from Tsinghua University, China. His field of interests includes image processing, pattern recognition.

View full text

Importance sampling based discriminative learning for large scale offline handwritten Chinese character recognition

Highlights

Abstract

Introduction

Section snippets

Related works

MQDF

Importance sampling based discriminative learning framework

Rejection sampling strategy

Sample importance weight estimated under the boosting algorithm

Sample importance weight estimated under MCE

Datasets and experiment setup

Conclusion

Conflict of interest statement

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit.

Pattern Recognit. Lett.

Pattern Recognit. Lett.

Stat. Probab. Lett.

Pattern Recognit. Lett.

Pattern Recognit.

A handwritten character recognition system using directional element feature and asymmetric Mahalanobis distance

IEEE Trans. Pattern Anal. Mach. Intell.

Handwritten numeral recognition using gradient and curvature of gray scale image

Pattern Recognit.

Modified quadratic discriminant functions and its application to Chinese character recognition

IEEE Trans. Pattern Anal. Mach. Intell.

Pattern Classification

Additive logistic regression: a statistical view of boosting

Ann. Stat.

Support-vector networks

Mach. Learn.