Abstract
This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%.




Similar content being viewed by others
References
Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)
Fujisawa, H.: Forty years of research in character and document recognition—an industrial perspective. Pattern Recognit. 41(8), 2435–2446 (2008)
Liu, C.-L., Yin, F., Wang, Q.-F., Wang, D.-H.: ICDAR 2011 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1469 (2011)
Yin, F., Wang, Q.-F., Zhang, X.-Y., Liu, C.-L.: ICDAR 2013 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1470 (2013)
Ding, X., Liu, H.: Segmentation-driven offline handwritten Chinese and Arabic script recognition. In: Proceedings of the Arabic and Chinese Handwriting, pp. 61–73 (2006)
Fu, Q., Ding, X.-Q., Liu, T., Jiang, Y., Ren, Z.: A novel segmentation and recognition algorithm for Chinese handwritten address character strings. In: Proceedings of the ICPR, pp. 974–977 (2006)
Li, N.-X., Jin, L.-W.: A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition. In: Proceedings of the IEEE SMC, pp. 3664–3668 (2010)
Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2012)
Wang, S., Chen, L., Xu, L., Fan, W., Sun, J., Naoi, S.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: Proceedings of the ICFHR, pp. 84–89 (2016)
Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 65, 251–264 (2017)
Su, T.-H., Zhang, T.-W., Guan, D.-J., Huang, H.-J.: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recognit. 42(1), 167–182 (2009)
Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: Proceedings of the ICDAR, pp. 171–175 (2015)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ICML, pp. 369–376 (2006)
Suryani, D., Doetsch, P., Ney, H.: On the benefits of convolutional neural network combinations in offline handwriting recognition. In: Proceedings of the ICFHR (2016)
Bunke, H., Bengio, S., Vinciarelli, A.: Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 709–720 (2004)
Guo, Q., Wang, F.-L., Lei, J., Tu, D., Li, G.-H.: Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neurocomputing 184, 78–90 (2016)
Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTHs system for off-line handwriting recognition. In: Proceedings of the ICDAR, pp. 935–939 (2013)
Chen, K., Yan, Z.J., Huo, Q.: A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition. In: Proceedings of the ICDAR, pp. 411–415 (2015)
Du, J., Wang, Z.-R., Zhai, J.-F., Hu, J.-S.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ICPR (2016)
Wang, Z.-R., Du, J., Hu, J.-S., Hu, Y.-L.: Deep convolutional neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ACPR (2017)
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Vesel\(\acute{y}\), K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of the Interspeech, pp. 2345–2349 (2013)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Jelinek, F.: The development of an experimental discrete dictation recognizer. Proc. IEEE 73(11), 1616–1624 (1985)
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 20(1), 69–88 (2002)
Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the CIAA, pp. 11–23 (2007)
Liu, C.-L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognit. 36(10), 2271–2285 (2003)
Liu, C.-L.: Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1465–1469 (2007)
Bai, Z.-L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Proceedings of the ICDAR, pp. 262–266 (2005)
Rencher, A.C.: Methods of Multivariate Analysis. Wiley, New York (2002)
Baum, L.E., Eagon, J.A.: An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Am. Math. Soc. 73, 360–363 (1967)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Juang, B.H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. Inf. Theory 32(2), 307–309 (1986)
Young, S., et al.: The HTK Book (Revised for HTK Version 3.4.1). Cambridge University, Cambridge (2009)
Povey, D., Ghoshal, A. et al.: The kaldi speech recognition toolkit. In: Proceedings of the ASRU (2011)
Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the ICASSP, pp. 49–52 (1986)
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. dissertation, University of Cambridge, Cambridge, UK (2003)
Povey, D., Kingsbury, B.: Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the ICASSP, pp. IV-321–IV-324 (2007)
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the ICASSP, pp. 181–184 (1995)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)
Stolcke, A.: SRILM: an extensible language modeling toolkit. In: Proceedings of the ICSLP, pp. 901–904 (2002)
Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: CASIA online and offline Chinese handwriting databases. In: Proceedings of the ICDAR, pp. 37–41 (2011)
Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 46(1), 155–162 (2013)
Jia, Y.-Q., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. Preprint arXiv:1408.5093 (2014)
Povey, D., Peddinti, V. et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of the Interspeech, pp. 2751–2755 (2016)
Acknowledgements
This work was supported in part by the National Key R&D Program of China under Contract No. 2017YFB1002202, the National Natural Science Foundation of China under Grant Nos. 61671422 and U1613211, the Key Science and Technology Project of Anhui Province under Grant No. 17030901005, and MOE-Microsoft Key Laboratory of USTC.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wang, ZR., Du, J., Wang, WC. et al. A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. IJDAR 21, 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10032-018-0307-0