Abstract
A primary reason for performance degradation in unconstrained online handwritten Chinese character recognition is the subtle differences between similar characters. Various methods have been proposed in previous works to address the problem of generating similar characters. These methods are basically comprised of two components—similar character discovery and cascaded classifiers. The goal of similar character discovery is to make similar character pairs/sets cover as many misclassified samples as possible. It is observed that the confidence of convolutional neural network (CNN) is output by an end-to-end manner and it can be understood as one type of probability metric. In this paper, we propose an algorithm by leveraging CNN confidence for discovering similar character pairs/sets. Specifically, a deep CNN is applied to output the top ranked candidates and the corresponding confidence scores, followed by an accumulating and averaging procedure. We experimentally found that the number of similar character pairs for each class is diverse and the confusion degree of similar character pairs is varied. To address these problems, we propose an entropy- based similarity measurement to rank these similar character pairs/sets and reject those with low similarity. The experimental results indicate that by using 30,000 similar character pairs, our method achieves the hit rates of 98.44 and 98.05 % on CASIA-OLHWDB1.0 and CASIA-OLHWDB1.0–1.2 datasets, respectively, which are significantly higher than corresponding results produced by MQDF-based method (95.42 and 94.49 %). Furthermore, recognition of ten randomly selected similar character subsets with a two-stage classification scheme results in a relative error reduction of 30.11 % comparing with traditional single stage scheme, showing the potential usage of the proposed method.











Similar content being viewed by others
References
Bai, Z.L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Document Analysis and Recognition (ICDAR), 2005 International Conference on, pp. 262–266. IEEE (2005)
Chen, K.T.: Integration of paths—a faithful representation of paths by noncommutative formal power series. Trans. Am. Math. Soc. 89(2), 395–407 (1958)
Gao, T.F., Liu, C.L.: Combining quadratic classifier and pair discriminators by pairwise coupling for handwritten Chinese character recognition. In: Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, pp. 1–4. IEEE (2008)
Gao, T.F., Liu, C.L.: High accuracy handwritten Chinese character recognition using LDA-based compound distances. Pattern Recognit. 41(11), 3442–3451 (2008)
Graham, B.: Sparse Arrays of Signatures for Online Character Recognition. arXiv preprint arXiv:1308.0371 (2013)
Gu, S., Zhang, L., Zuo, W., Feng, X.: Projective dictionary pair learning for pattern classification. In: Advances in Neural Information Processing Systems, pp. 793–801 (2014)
He, M., Zhang, S., Mao, H., Jin, L.: Recognition confidence analysis of handwritten Chinese character with CNN. In: Document Analysis and Recognition (ICDAR), 2015 International Conference on, pp. 61–65. IEEE (2015)
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors. arXiv preprint arXiv:1207.0580 (2012)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)
Jin, L., Gao, Y., Liu, G., Li, Y., Ding, K.: SCUT-COUCH2009—a comprehensive online unconstrained Chinese handwriting database and benchmark evaluation. Int. J. Doc. Anal. Recognit. (IJDAR) 14(1), 53–64 (2011)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Leung, K., Leung, C.: Recognition of handwritten Chinese characters by critical region analysis. Pattern Recognit. 43(3), 949–961 (2010)
Liu, C.L.: Classifier combination based on confidence transformation. Pattern Recognit. 38(1), 11–28 (2005)
Liu, C.L., Yin, F., Wang, D.H., Wang, Q.F.: Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 46(1), 155–162 (2013)
Moore, A.: K-Means and Hierarchical Clustering (2001). http://www.cs.cmu.edu/afs/cs/user/awm/web/tutorials/kmeans11.pdf. Accessed 15 Mar 2015
Ryu, S., Kim, I.J.: Discrimination of similar characters using nonlinear normalization based on regional importance measure. Int. J. Doc. Anal. Recognit. (IJDAR) 17(1), 79–89 (2014)
Shao, Y., Wang, C., Xiao, B., Zhang, R., Zhang, Y.: Multiple instance learning based method for similar handwritten Chinese characters discrimination. In: Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 1002–1006. IEEE (2011)
Suzuki, M., Ohmachi, S., Kato, N., Aso, H., Nemoto, Y.: A discrimination method of similar characters using compound Mahalanobis function. Trans. IEICE Jpn. 80(10), 2752–2760 (1997)
Tao, D., Liang, L., Jin, L., Gao, Y.: Similar handwritten Chinese character recognition using discriminative locality alignment manifold learning. In: Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp. 1012–1016. IEEE (2011)
Wang, D.H., Liu, C.L.: Learning confidence transformation for handwritten Chinese text recognition. Int. J. Doc. Anal. Recognit. (IJDAR) 17(3), 205–219 (2014)
Xu, B., Huang, K., Liu, C.L.: Similar handwritten Chinese characters recognition by critical region selection based on average symmetric uncertainty. In: Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on, pp. 527–532. IEEE (2010)
Yang, W., Jin, L., Xie, Z., Feng, Z.: Improved deep convolutional neural network for online handwritten Chinese character recognition using domain-specific knowledge. In: Document Analysis and Recognition (ICDAR), 2015 International Conference on, pp. 551–555. IEEE (2015)
Yang, Z., Tao, D., Zhang, S., Jin, L.: Similar handwritten Chinese character recognition based on deep neural networks with big data. J. Commun. 35(9), 184–189 (2014)
Acknowledgments
The authors thank all reviewers for their valuable comments on improving the quality of this paper. This research is supported in part by NSFC (Grant No. 61472144), National Science and Technology Support Plan (Grant Nos. 2013BAH65F01, 2013BAH65F04), GDSTP (Grant Nos. 2013B010202004, 2014A010103012, 2015B010101004, 2015B010130003), GDUPS (2011), Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20120172110023).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, S., Jin, L. & Lin, L. Discovering similar Chinese characters in online handwriting with deep convolutional neural networks. IJDAR 19, 237–252 (2016). https://doi.org/10.1007/s10032-016-0268-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-016-0268-0