Skip to main content
Log in

Statistical and Hybrid Methods for Speech Recognition in Romanian

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The present paper describes the evolution of our work concerning the problem of speech recognition. Beginning with a classical hidden Markov model (HMM), we have investigated two ways to improve the performance of this basic structure. The first way was to realize a neuro-statistical hybrid by integrating a multilayer perceptron (MLP) as a posteriori probability estimator. The system was further refined by adding supplementary discriminative training (DT) based on the minimum classification error (MCE). Tests performed on a 15,000 isolated spoken-word database, showed an increase in the recognition rate from 92.2% for the HMM-based recognition system, to 94.7% for the HMM-MLP system, and then to 98.1% for the refined HMM-MLP-DT system. The second way to improve the classical HMM was to build a fuzzy-statistical hybrid, FHMM, based on a fuzzy similarity measure instead of the probabilistic measure specific to the usual statistical model. The benefits of the fuzzy measure introduction were evaluated on a vowel recognition task, and a decrease of approximately 3% in the error rate is reported.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bourlard, H. and Wellekens, C.J. (1990). Links between Markov models and multilayer perceptrons. IEEE Transactions on Pattern Analysis and Machine Intelligence, 12: 1167–1178.

    Google Scholar 

  • Bourlard, H. and Morgan, N. (1994). Connectionist Speech Recognition-A Hybrid Approach. Boston: Kluwer Academic Publishers.

    Google Scholar 

  • Gavat, I., Zirra, M., and Enescu, V. (1996a). AhybridNN-HMMsystem for connected digit recognition over telephone in Romanian language. IVTTA'96 Proceedings. Basking Ridge, NJ: IVTTA, pp. 37–40.

    Google Scholar 

  • Gavat, I. and Zirra, M. (1996b). Fuzzy models in vowel recognition for Romanian language. Fuzzy-IEEE'96 Proceedings. New Orleans: Fuzzy-IEEE, pp. 1318–1326.

    Google Scholar 

  • Gavat, I., Grigore, O., Zirra, M., and Cula, O. (1997). Fuzzy variants of hard classification rules. NAFIPS'97 Proceedings. New York: NAFIPS, pp. 172–176.

    Google Scholar 

  • Gavat, I., Zirra, M., and Cula, O. (1998). Hybrid speech recognition system with discriminative training applied for Romanian language. MELECON'98 Proceedings. Tel Aviv, Israel: MELECON, pp. 11–15.

    Google Scholar 

  • Gavat, I., Valsan, Z., Sabac, B., Grigore, O., and Militaru, D. (2001a). Fuzzy similarity measures-alternative to improve discriminative capabilities of HMM speech recognizers. ICA 2001 Proceedings. Rome, Italy: ICA, pp. 2316–2317.

    Google Scholar 

  • Gavat, I., Valsan, Z., and Grigore, O. (2001b). Fuzzy-variants of hidden Markov models applied in speech recognition. SCI 2001 Proceedings, Invited Session: Computational Intelligence in Signal and Image Processing. Orlando, Florida: SCI, pp. 126–130.

    Google Scholar 

  • Grigore, M. and Gavat, I. (1996). Vowel recognition with nonlinear perceptron. CAS'96 Proceedings. Sinaia, Romania: CAS, pp. 155–158.

    Google Scholar 

  • Grigore, O., Gavat, I., and Zirra, M. (1998). Neural network vowel recognition in Romanian language. CONTI'98 Proceedings. Timisoara, Romania: CONTI, pp. 165–172.

    Google Scholar 

  • Grigore, O. and Gavat, I. (1999). Neuro-fuzzy models for speech pattern recognition in Romanian language. ESIT'99 Proceedings. Rhodos, Greece: ESIT, pp. 98–103.

    Google Scholar 

  • Juang, B.H. and Katagiri, S. (1992). Discriminative learning for minimumerror classification. IEEE Transactions on Signal Processing, 12: 3043–3054.

    Google Scholar 

  • Lippmann, R. and Singer, E. (1993). Hybrid neural network/HMM approaches toword spotting. ICASSP'93 Proceedings. Minneapolis: ICASSP, pp. 565–568.

    Google Scholar 

  • Mahomed, M. and Gader, P. (2000). Generalized hidden Markov models. IEEE Transactions on Fuzzy Systems, 2: 67–93.

    Google Scholar 

  • Morgan, N. and Bourlard, H.A. (1995). Neural networks for statistical recognition of continuous speech. Proceedings of IEEE, 5: 741–770.

    Google Scholar 

  • Ostendorf, M., Digalakis, V., and Kimball, O.A. (1996). From HMMs to segment models: A unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 5: 18–323.

    Google Scholar 

  • Rabiner, L.R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 2: 257–285.

    Google Scholar 

  • Reichl, W., Caspary, P., and Ruske, G. (1994). A new modeldiscriminant training algorithm for hybrid NN-HMM systems. ICASSP'94 Proceedings. Adelaide, Australia: ICASSP, pp. 677–680.

    Google Scholar 

  • Richard, M. and Lippmann, R. (1991). Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 4: 461–483.

    Google Scholar 

  • Valsan, Z., Sabac, B., and Gavat, I. (1998a). Combining self organizing feature map and multilayer perceptron in a neural system for fast key-word spotting. SPECOM'98 Proceedings. St. Petersburg, Russia: SPECOM, pp. 303–308.

    Google Scholar 

  • Valsan, Z., Sabac, B., Gavat, I., and Zamfirescu, D. (1998b). Combining self-organizing map and multilayer perceptron in a neural system for improved isolated word recognition. Communications '98 Proceedings. Bucharest, Romania: Communications, pp. 245–251.

    Google Scholar 

  • Wang, Z. and Klirr, G. (1992). Fuzzy Measure Theory. New York: Plenum.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Valsan, Z., Gavat, I., Sabac, B. et al. Statistical and Hybrid Methods for Speech Recognition in Romanian. International Journal of Speech Technology 5, 259–268 (2002). https://doi.org/10.1023/A:1020249008539

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/A:1020249008539

Navigation