A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition

Wang, Zi-Rui; Du, Jun; Wang, Wen-Chao; Zhai, Jian-Fang; Hu, Jin-Shui

doi:10.1007/s10032-018-0307-0

A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition

Original Paper
Published: 15 June 2018

Volume 21, pages 241–251, (2018)
Cite this article

International Journal on Document Analysis and Recognition (IJDAR) Aims and scope Submit manuscript

Zi-Rui Wang¹,
Jun Du¹,
Wen-Chao Wang¹,
Jian-Fang Zhai² &
…
Jin-Shui Hu²

926 Accesses
34 Citations
Explore all metrics

Abstract

This paper proposes an effective segmentation-free approach using a hybrid neural network hidden Markov model (NN-HMM) for offline handwritten Chinese text recognition (HCTR). In the general Bayesian framework, the handwritten Chinese text line is sequentially modeled by HMMs with each representing one character class, while the NN-based classifier is adopted to calculate the posterior probability of all HMM states. The key issues in feature extraction, character modeling, and language modeling are comprehensively investigated to show the effectiveness of NN-HMM framework for offline HCTR. First, a conventional deep neural network (DNN) architecture is studied with a well-designed feature extractor. As for the training procedure, the label refinement using forced alignment and the sequence training can yield significant gains on top of the frame-level cross-entropy criterion. Second, a deep convolutional neural network (DCNN) with automatically learned discriminative features demonstrates its superiority to DNN in the HMM framework. Moreover, to solve the challenging problem of distinguishing quite confusing classes due to the large vocabulary of Chinese characters, NN-based classifier should output 19900 HMM states as the classification units via a high-resolution modeling within each character. On the ICDAR 2013 competition task of CASIA-HWDB database, DNN-HMM yields a promising character error rate (CER) of 5.24% by making a good trade-off between the computational complexity and recognition accuracy. To the best of our knowledge, DCNN-HMM can achieve a best published CER of 3.53%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Article 31 January 2020

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

Deep Learning Based Handwritten Chinese Character and Text Recognition

References

Liu, C.-L., Jaeger, S., Nakagawa, M.: Online recognition of Chinese characters: the state-of-the-art. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 198–213 (2004)
Article Google Scholar
Fujisawa, H.: Forty years of research in character and document recognition—an industrial perspective. Pattern Recognit. 41(8), 2435–2446 (2008)
Article Google Scholar
Liu, C.-L., Yin, F., Wang, Q.-F., Wang, D.-H.: ICDAR 2011 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1469 (2011)
Yin, F., Wang, Q.-F., Zhang, X.-Y., Liu, C.-L.: ICDAR 2013 Chinese handwriting recognition competition. In: Proceedings of the ICDAR, pp. 1464–1470 (2013)
Ding, X., Liu, H.: Segmentation-driven offline handwritten Chinese and Arabic script recognition. In: Proceedings of the Arabic and Chinese Handwriting, pp. 61–73 (2006)
Fu, Q., Ding, X.-Q., Liu, T., Jiang, Y., Ren, Z.: A novel segmentation and recognition algorithm for Chinese handwritten address character strings. In: Proceedings of the ICPR, pp. 974–977 (2006)
Li, N.-X., Jin, L.-W.: A Bayesian-based probabilistic model for unconstrained handwritten offline Chinese text line recognition. In: Proceedings of the IEEE SMC, pp. 3664–3668 (2010)
Wang, Q.-F., Yin, F., Liu, C.-L.: Handwritten Chinese text recognition by integrating multiple contexts. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1469–1481 (2012)
Article Google Scholar
Wang, S., Chen, L., Xu, L., Fan, W., Sun, J., Naoi, S.: Deep knowledge training and heterogeneous CNN for handwritten Chinese text recognition. In: Proceedings of the ICFHR, pp. 84–89 (2016)
Wu, Y.-C., Yin, F., Liu, C.-L.: Improving handwritten Chinese text recognition using neural network language models and convolutional neural network shape models. Pattern Recognit. 65, 251–264 (2017)
Article Google Scholar
Su, T.-H., Zhang, T.-W., Guan, D.-J., Huang, H.-J.: Off-line recognition of realistic Chinese handwriting using segmentation-free strategy. Pattern Recognit. 42(1), 167–182 (2009)
Article Google Scholar
Messina, R., Louradour, J.: Segmentation-free handwritten Chinese text recognition with LSTM-RNN. In: Proceedings of the ICDAR, pp. 171–175 (2015)
Graves, A., Liwicki, M., Fernandez, S., Bertolami, R., Bunke, H., Schmidhuber, J.: A novel connectionist system for improved unconstrained handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(5), 855–868 (2009)
Article Google Scholar
Graves, A., Fernandez, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: Proceedings of the ICML, pp. 369–376 (2006)
Suryani, D., Doetsch, P., Ney, H.: On the benefits of convolutional neural network combinations in offline handwriting recognition. In: Proceedings of the ICFHR (2016)
Bunke, H., Bengio, S., Vinciarelli, A.: Offline recognition of unconstrained handwritten texts using HMMs and statistical language models. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 709–720 (2004)
Article Google Scholar
Guo, Q., Wang, F.-L., Lei, J., Tu, D., Li, G.-H.: Convolutional feature learning and hybrid CNN-HMM for scene number recognition. Neurocomputing 184, 78–90 (2016)
Article Google Scholar
Kozielski, M., Doetsch, P., Ney, H.: Improvements in RWTHs system for off-line handwriting recognition. In: Proceedings of the ICDAR, pp. 935–939 (2013)
Chen, K., Yan, Z.J., Huo, Q.: A context-sensitive-chunk BPTT approach to training deep LSTM/BLSTM recurrent neural networks for offline handwriting recognition. In: Proceedings of the ICDAR, pp. 411–415 (2015)
Du, J., Wang, Z.-R., Zhai, J.-F., Hu, J.-S.: Deep neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ICPR (2016)
Wang, Z.-R., Du, J., Hu, J.-S., Hu, Y.-L.: Deep convolutional neural network based hidden Markov model for offline handwritten Chinese text recognition. In: Proceedings of the ACPR (2017)
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Vesel$\acute{y}$, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of the Interspeech, pp. 2345–2349 (2013)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv:1502.03167 (2015)
Jelinek, F.: The development of an experimental discrete dictation recognizer. Proc. IEEE 73(11), 1616–1624 (1985)
Article Google Scholar
Mohri, M., Pereira, F., Riley, M.: Weighted finite-state transducers in speech recognition. Comput. Speech Lang. 20(1), 69–88 (2002)
Article Google Scholar
Allauzen, C., Riley, M., Schalkwyk, J., Skut, W., Mohri, M.: OpenFst: a general and efficient weighted finite-state transducer library. In: Proceedings of the CIAA, pp. 11–23 (2007)
Liu, C.-L., Nakashima, K., Sako, H., Fujisawa, H.: Handwritten digit recognition: benchmarking of state-of-the-art techniques. Pattern Recognit. 36(10), 2271–2285 (2003)
Article Google Scholar
Liu, C.-L.: Normalization-cooperated gradient feature extraction for handwritten character recognition. IEEE Trans. Pattern Anal. Mach. Intell. 29(8), 1465–1469 (2007)
Article Google Scholar
Bai, Z.-L., Huo, Q.: A study on the use of 8-directional features for online handwritten Chinese character recognition. In: Proceedings of the ICDAR, pp. 262–266 (2005)
Rencher, A.C.: Methods of Multivariate Analysis. Wiley, New York (2002)
Book Google Scholar
Baum, L.E., Eagon, J.A.: An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull. Am. Math. Soc. 73, 360–363 (1967)
Article MathSciNet Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Juang, B.H., Levinson, S., Sondhi, M.: Maximum likelihood estimation for multivariate mixture observations of Markov chains. IEEE Trans. Inf. Theory 32(2), 307–309 (1986)
Article Google Scholar
Young, S., et al.: The HTK Book (Revised for HTK Version 3.4.1). Cambridge University, Cambridge (2009)
Google Scholar
Povey, D., Ghoshal, A. et al.: The kaldi speech recognition toolkit. In: Proceedings of the ASRU (2011)
Bahl, L.R., Brown, P.F., de Souza, P.V., Mercer, R.L.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the ICASSP, pp. 49–52 (1986)
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. dissertation, University of Cambridge, Cambridge, UK (2003)
Povey, D., Kingsbury, B.: Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the ICASSP, pp. IV-321–IV-324 (2007)
Hubel, D.H., Wiesel, T.N.: Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J. Physiol. 160, 106–154 (1962)
Article Google Scholar
Kneser, R., Ney, H.: Improved backing-off for m-gram language modeling. In: Proceedings of the ICASSP, pp. 181–184 (1995)
Katz, S.: Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Trans. Acoust. Speech Signal Process. 35(3), 400–401 (1987)
Article Google Scholar
Stolcke, A.: SRILM: an extensible language modeling toolkit. In: Proceedings of the ICSLP, pp. 901–904 (2002)
Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: CASIA online and offline Chinese handwriting databases. In: Proceedings of the ICDAR, pp. 37–41 (2011)
Liu, C.-L., Yin, F., Wang, D.-H., Wang, Q.-F.: Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 46(1), 155–162 (2013)
Article Google Scholar
Jia, Y.-Q., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. Preprint arXiv:1408.5093 (2014)
Povey, D., Peddinti, V. et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Proceedings of the Interspeech, pp. 2751–2755 (2016)

Download references

Acknowledgements

This work was supported in part by the National Key R&D Program of China under Contract No. 2017YFB1002202, the National Natural Science Foundation of China under Grant Nos. 61671422 and U1613211, the Key Science and Technology Project of Anhui Province under Grant No. 17030901005, and MOE-Microsoft Key Laboratory of USTC.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, Anhui, People’s Republic of China
Zi-Rui Wang, Jun Du & Wen-Chao Wang
iFlytek Research, Hefei, Anhui, People’s Republic of China
Jian-Fang Zhai & Jin-Shui Hu

Authors

Zi-Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Du
View author publications
You can also search for this author in PubMed Google Scholar
Wen-Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian-Fang Zhai
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Shui Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Du.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, ZR., Du, J., Wang, WC. et al. A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition. IJDAR 21, 241–251 (2018). https://doi.org/10.1007/s10032-018-0307-0

Download citation

Received: 28 January 2018
Revised: 29 May 2018
Accepted: 02 June 2018
Published: 15 June 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10032-018-0307-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive study of hybrid neural network hidden Markov model for offline handwritten Chinese text recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Meitei Mayek handwritten dataset: compilation, segmentation, and character recognition

Searching from the Prediction of Visual and Language Model for Handwritten Chinese Text Recognition

Deep Learning Based Handwritten Chinese Character and Text Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now