Tied Mixture Modeling in Hindi Speech Recognition System

Aggarwal, R. K.; Dave, M.

doi:10.1007/978-3-642-15766-0_86

R. K. Aggarwal³ &
M. Dave³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 101))

Included in the following conference series:

International Conference on Advances in Information and Communication Technologies

851 Accesses

Abstract

The goal of automatic speech recognition (ASR) is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or environment. In ASR, the speech signal is captured and parameterized at front-end and evaluated at back-end using the Gaussian mixture hidden Markov model (HMM). In statistical modeling, to handle the large number of HMM state parameters and to minimize the computation overhead, similar states are tied. In this paper we present a scheme to find the degree of mixture tying that is best suited for the small amount of training data, usually available for Indian languages. In our proposed approach, perceptual linear prediction (PLP) combined with Heteroscedastic linear discriminant analysis (HLDA) was used for feature extraction. All the experiments were conducted in general field conditions and in context of Indian languages, specifically Hindi, and for Indian speaking style.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

O’Shaughnessy, D.: Interacting with Computers by Voice-Automatic Speech Recognitions and Synthesis. Proceedings of the IEEE 91(9), 1272–1305 (2003)
Article Google Scholar
Young, S.J.: The General Use of Tying in Phoneme-Based HMM Speech Recognizers. In: Proceeding of ICASSP (1992)
Google Scholar
Huang, X.D., Jack, M.A.: Performance Comparison between Semi-continuous and Discrete Hidden Markov Models. IEE Electronics Letters 24(3), 149–150 (1988)
Article Google Scholar
Bellegarda, I.R., Nahamoo, D.: Tied Mixture Continuous Parameter Modeling for Speech Recognition. IEEE Trans. ASSP 38(12), 2033–2045 (1990)
Article Google Scholar
Becchetti, C., Ricotti, K.P.: Speech Recognition Theory and C++ Implementation. John Wiley, Chichester (2004)
Google Scholar
Gales, M., Young, S.: The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing 1(3), 195–304 (2007)
Article MATH Google Scholar
Vergin, R., O’Shaughnessy, D., Farhat, A.: Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition. IEEE Trans. on Speech and Audio Processing 7(5), 525–532 (1999)
Article Google Scholar
Hermansky, H.: Perceptually Predictive (PLP) Analysis of Speech. Journal of Acoustic Society of America 87, 1738–1752 (1990)
Article Google Scholar
Hermansky, H., Sharma, S.: Temporal Patterns (TRAPs) in ASR of Noisy Speech. In: Proc. of IEEE Conference on Acoustic Speech and Signal Processing (1999)
Google Scholar
Sharma, A., Shrotriya, M.C., Farooq, O., Abbasi, Z.A.: Hybrid Wavelet Based LPC Features for Hindi Speech Recognition. International Journal of Information and Communication Technology, Inderscience publisher 1, 373–381 (2008)
Article Google Scholar
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Google Scholar
Deng, L.: Dynamic Speech Models: Theory, Applications, and Algorithms. Morgan and Claypool, San Francisco (2006)
Google Scholar
Rao, K.S., Yegnanarayana, B.: Modelling Syllable Duration in Indian Languages Using Neural Networks. In: Proceeding of ICASSP, Montreal, Canada, pp. 313–316 (May 2004)
Google Scholar
Guo, G., Li, S.Z.: Content Based Audio Classification and Retrieval by SVMs. IEEE Trans. Neural Networks 14, 209–215 (2003)
Article Google Scholar
Juang, B.-H.: Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains. AT & T Tech. Journal 64(6), 1235–1249 (1985)
MATH MathSciNet Google Scholar
Juang, B.-H., Katagiri, S.: Discriminative Learning for Minimum Error Classification. IEEE Transactions on Signal Processing 14(4) (1992)
Google Scholar
Kumar, N., Andreou, A.G.: Heteroscedastic Disciminant Analysis and Reduced Rank HMMs for Improved Speech Recognition. Speech Communication 26, 283–297 (1998)
Article Google Scholar
Young, S.: A review of Large Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine 13, 45–57 (1996)
Article Google Scholar
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice Hall-PTR, New Jersy (April 2001)
Google Scholar
Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based State Tying for High Accuracy Acoustic Modeling. In: Proc. of Human Language Technology Workshop, pp. 307–312 (1994)
Google Scholar
Digalakis, V.V., Monaco, P., Murveit, H.: Genones; Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers. IEEE Transactions On Speech And Audio Processing 4(4), 281–289 (1996)
Article Google Scholar
Hidden Markov Model Toolkit (HTK-3.4.1), http://htk.eng.cam.ac.uk
Aggarwal, R.K., Dave, M.: Effects of Mixtures in Statistical Modeling of Hindi Speech Recognition System. In: Proceedings of the 2nd International Conference on Intelligent Human Computer Interaction, IIIT, Allahabad, January 16-18, pp. 224–229. Springer, Heidelberg (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, National Institute of Technology, Kurukshetra
R. K. Aggarwal & M. Dave

Authors

R. K. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar
M. Dave
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Engineers Network, Trivandrum, Kerala, India
Vinu V Das
NSS College of Engineering, Palakkadu, India
R. Vijaykumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aggarwal, R.K., Dave, M. (2010). Tied Mixture Modeling in Hindi Speech Recognition System. In: Das, V.V., Vijaykumar, R. (eds) Information and Communication Technologies. ICT 2010. Communications in Computer and Information Science, vol 101. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15766-0_86

Download citation

DOI: https://doi.org/10.1007/978-3-642-15766-0_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15765-3
Online ISBN: 978-3-642-15766-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics