Skip to main content

Tied Mixture Modeling in Hindi Speech Recognition System

  • Conference paper
Information and Communication Technologies (ICT 2010)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 101))

  • 851 Accesses

Abstract

The goal of automatic speech recognition (ASR) is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or environment. In ASR, the speech signal is captured and parameterized at front-end and evaluated at back-end using the Gaussian mixture hidden Markov model (HMM). In statistical modeling, to handle the large number of HMM state parameters and to minimize the computation overhead, similar states are tied. In this paper we present a scheme to find the degree of mixture tying that is best suited for the small amount of training data, usually available for Indian languages. In our proposed approach, perceptual linear prediction (PLP) combined with Heteroscedastic linear discriminant analysis (HLDA) was used for feature extraction. All the experiments were conducted in general field conditions and in context of Indian languages, specifically Hindi, and for Indian speaking style.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. O’Shaughnessy, D.: Interacting with Computers by Voice-Automatic Speech Recognitions and Synthesis. Proceedings of the IEEE 91(9), 1272–1305 (2003)

    Article  Google Scholar 

  2. Young, S.J.: The General Use of Tying in Phoneme-Based HMM Speech Recognizers. In: Proceeding of ICASSP (1992)

    Google Scholar 

  3. Huang, X.D., Jack, M.A.: Performance Comparison between Semi-continuous and Discrete Hidden Markov Models. IEE Electronics Letters 24(3), 149–150 (1988)

    Article  Google Scholar 

  4. Bellegarda, I.R., Nahamoo, D.: Tied Mixture Continuous Parameter Modeling for Speech Recognition. IEEE Trans. ASSP 38(12), 2033–2045 (1990)

    Article  Google Scholar 

  5. Becchetti, C., Ricotti, K.P.: Speech Recognition Theory and C++ Implementation. John Wiley, Chichester (2004)

    Google Scholar 

  6. Gales, M., Young, S.: The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing 1(3), 195–304 (2007)

    Article  MATH  Google Scholar 

  7. Vergin, R., O’Shaughnessy, D., Farhat, A.: Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition. IEEE Trans. on Speech and Audio Processing 7(5), 525–532 (1999)

    Article  Google Scholar 

  8. Hermansky, H.: Perceptually Predictive (PLP) Analysis of Speech. Journal of Acoustic Society of America 87, 1738–1752 (1990)

    Article  Google Scholar 

  9. Hermansky, H., Sharma, S.: Temporal Patterns (TRAPs) in ASR of Noisy Speech. In: Proc. of IEEE Conference on Acoustic Speech and Signal Processing (1999)

    Google Scholar 

  10. Sharma, A., Shrotriya, M.C., Farooq, O., Abbasi, Z.A.: Hybrid Wavelet Based LPC Features for Hindi Speech Recognition. International Journal of Information and Communication Technology, Inderscience publisher 1, 373–381 (2008)

    Article  Google Scholar 

  11. Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)

    Google Scholar 

  12. Deng, L.: Dynamic Speech Models: Theory, Applications, and Algorithms. Morgan and Claypool, San Francisco (2006)

    Google Scholar 

  13. Rao, K.S., Yegnanarayana, B.: Modelling Syllable Duration in Indian Languages Using Neural Networks. In: Proceeding of ICASSP, Montreal, Canada, pp. 313–316 (May 2004)

    Google Scholar 

  14. Guo, G., Li, S.Z.: Content Based Audio Classification and Retrieval by SVMs. IEEE Trans. Neural Networks 14, 209–215 (2003)

    Article  Google Scholar 

  15. Juang, B.-H.: Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains. AT & T Tech. Journal 64(6), 1235–1249 (1985)

    MATH  MathSciNet  Google Scholar 

  16. Juang, B.-H., Katagiri, S.: Discriminative Learning for Minimum Error Classification. IEEE Transactions on Signal Processing 14(4) (1992)

    Google Scholar 

  17. Kumar, N., Andreou, A.G.: Heteroscedastic Disciminant Analysis and Reduced Rank HMMs for Improved Speech Recognition. Speech Communication 26, 283–297 (1998)

    Article  Google Scholar 

  18. Young, S.: A review of Large Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine 13, 45–57 (1996)

    Article  Google Scholar 

  19. Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice Hall-PTR, New Jersy (April 2001)

    Google Scholar 

  20. Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based State Tying for High Accuracy Acoustic Modeling. In: Proc. of Human Language Technology Workshop, pp. 307–312 (1994)

    Google Scholar 

  21. Digalakis, V.V., Monaco, P., Murveit, H.: Genones; Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers. IEEE Transactions On Speech And Audio Processing 4(4), 281–289 (1996)

    Article  Google Scholar 

  22. Hidden Markov Model Toolkit (HTK-3.4.1), http://htk.eng.cam.ac.uk

  23. Aggarwal, R.K., Dave, M.: Effects of Mixtures in Statistical Modeling of Hindi Speech Recognition System. In: Proceedings of the 2nd International Conference on Intelligent Human Computer Interaction, IIIT, Allahabad, January 16-18, pp. 224–229. Springer, Heidelberg (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Aggarwal, R.K., Dave, M. (2010). Tied Mixture Modeling in Hindi Speech Recognition System. In: Das, V.V., Vijaykumar, R. (eds) Information and Communication Technologies. ICT 2010. Communications in Computer and Information Science, vol 101. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15766-0_86

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15766-0_86

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15765-3

  • Online ISBN: 978-3-642-15766-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics