Skip to main content
Log in

A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition

  • Original Paper
  • Published:
International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Abstract

This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)

    Article  Google Scholar 

  2. Fujisawa, H.: Forty years of research in character and document recognition—an industrial perspective. Pattern Recognit. 41(8), 2435–2446 (2008)

    Article  Google Scholar 

  3. Liu, C.-L., Yin, F., Wang, Q.-F., Wang, D.-H.: ICDAR 2011 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1469 (2011)

  4. Yin, F., Wang, Q.-F., Zhang, X.-Y., Liu, C.-L.: ICDAR 2013 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1470 (2013)

  5. Ding, X., Liu, H.: Segmentation-driven offline handwritten Chinese and Arabic script recognition. In: Proceedings of the Arabic and Chinese Handwriting, pp. 61–73 (2006)

  6. Fu, Q., Ding, X.-Q., Liu, T., Jiang, Y., Ren, Z.: A novel segmentation and recognition algorithm for Chinese handwritten address character strings. In: Proceedings of the ICPR, pp. 974–977 (2006)

  7. Li, N.-X., Jin, L.-W.: A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition. In: Proceedings of the IEEE SMC, pp. 3664–3668 (2010)

  8. Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2012)

    Article  Google Scholar 

  9. Wang, S., Chen, L., Xu, L., Fan, W., Sun, J., Naoi, S.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: Proceedings of the ICFHR, pp. 84–89 (2016)

  10. Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 65, 251–264 (2017)

    Article  Google Scholar 

  11. Su, T.-H., Zhang, T.-W., Guan, D.-J., Huang, H.-J.: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recognit. 42(1), 167–182 (2009)

    Article  Google Scholar 

  12. Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: Proceedings of the ICDAR, pp. 171–175 (2015)

  13. Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)

    Article  Google Scholar 

  14. Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ICML, pp. 369–376 (2006)

  15. Suryani, D., Doetsch, P., Ney, H.: On the benefits of convolutional neural network combinations in offline handwriting recognition. In: Proceedings of the ICFHR (2016)

  16. Bunke, H., Bengio, S., Vinciarelli, A.: Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 709–720 (2004)

    Article  Google Scholar 

  17. Guo, Q., Wang, F.-L., Lei, J., Tu, D., Li, G.-H.: Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neurocomputing 184, 78–90 (2016)

    Article  Google Scholar 

  18. Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTHs system for off-line handwriting recognition. In: Proceedings of the ICDAR, pp. 935–939 (2013)

  19. Chen, K., Yan, Z.J., Huo, Q.: A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition. In: Proceedings of the ICDAR, pp. 411–415 (2015)

  20. Du, J., Wang, Z.-R., Zhai, J.-F., Hu, J.-S.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ICPR (2016)

  21. Wang, Z.-R., Du, J., Hu, J.-S., Hu, Y.-L.: Deep convolutional neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ACPR (2017)

  22. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  23. Vesel\(\acute{y}\), K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of the Interspeech, pp. 2345–2349 (2013)

  24. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)

  25. Jelinek, F.: The development of an experimental discrete dictation recognizer. Proc. IEEE 73(11), 1616–1624 (1985)

    Article  Google Scholar 

  26. Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 20(1), 69–88 (2002)

    Article  Google Scholar 

  27. Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the CIAA, pp. 11–23 (2007)

  28. Liu, C.-L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognit. 36(10), 2271–2285 (2003)

    Article  Google Scholar 

  29. Liu, C.-L.: Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1465–1469 (2007)

    Article  Google Scholar 

  30. Bai, Z.-L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Proceedings of the ICDAR, pp. 262–266 (2005)

  31. Rencher, A.C.: Methods of Multivariate Analysis. Wiley, New York (2002)

    Book  Google Scholar 

  32. Baum, L.E., Eagon, J.A.: An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Am. Math. Soc. 73, 360–363 (1967)

    Article  MathSciNet  Google Scholar 

  33. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  34. Juang, B.H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. Inf. Theory 32(2), 307–309 (1986)

    Article  Google Scholar 

  35. Young, S., et al.: The HTK Book (Revised for HTK Version 3.4.1). Cambridge University, Cambridge (2009)

    Google Scholar 

  36. Povey, D., Ghoshal, A. et al.: The kaldi speech recognition toolkit. In: Proceedings of the ASRU (2011)

  37. Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the ICASSP, pp. 49–52 (1986)

  38. Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. dissertation, University of Cambridge, Cambridge, UK (2003)

  39. Povey, D., Kingsbury, B.: Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the ICASSP, pp. IV-321–IV-324 (2007)

  40. Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)

    Article  Google Scholar 

  41. Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the ICASSP, pp. 181–184 (1995)

  42. Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)

    Article  Google Scholar 

  43. Stolcke, A.: SRILM: an extensible language modeling toolkit. In: Proceedings of the ICSLP, pp. 901–904 (2002)

  44. Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: CASIA online and offline Chinese handwriting databases. In: Proceedings of the ICDAR, pp. 37–41 (2011)

  45. Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 46(1), 155–162 (2013)

    Article  Google Scholar 

  46. Jia, Y.-Q., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. Preprint arXiv:1408.5093 (2014)

  47. Povey, D., Peddinti, V. et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of the Interspeech, pp. 2751–2755 (2016)

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Contract No. 2017YFB1002202, the National Natural Science Foundation of China under Grant Nos. 61671422 and U1613211, the Key Science and Technology Project of Anhui Province under Grant No. 17030901005, and MOE-Microsoft Key Laboratory of USTC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Du.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, ZR., Du, J., Wang, WC. et al. A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. IJDAR 21, 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10032-018-0307-0

Keywords

Navigation