Abstract
The goal of automatic speech recognition (ASR) is to accurately and efficiently convert a speech signal into a text message independent of the device, speaker or environment. In ASR, the speech signal is captured and parameterized at front-end and evaluated at back-end using the Gaussian mixture hidden Markov model (HMM). In statistical modeling, to handle the large number of HMM state parameters and to minimize the computation overhead, similar states are tied. In this paper we present a scheme to find the degree of mixture tying that is best suited for the small amount of training data, usually available for Indian languages. In our proposed approach, perceptual linear prediction (PLP) combined with Heteroscedastic linear discriminant analysis (HLDA) was used for feature extraction. All the experiments were conducted in general field conditions and in context of Indian languages, specifically Hindi, and for Indian speaking style.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
O’Shaughnessy, D.: Interacting with Computers by Voice-Automatic Speech Recognitions and Synthesis. Proceedings of the IEEE 91(9), 1272–1305 (2003)
Young, S.J.: The General Use of Tying in Phoneme-Based HMM Speech Recognizers. In: Proceeding of ICASSP (1992)
Huang, X.D., Jack, M.A.: Performance Comparison between Semi-continuous and Discrete Hidden Markov Models. IEE Electronics Letters 24(3), 149–150 (1988)
Bellegarda, I.R., Nahamoo, D.: Tied Mixture Continuous Parameter Modeling for Speech Recognition. IEEE Trans. ASSP 38(12), 2033–2045 (1990)
Becchetti, C., Ricotti, K.P.: Speech Recognition Theory and C++ Implementation. John Wiley, Chichester (2004)
Gales, M., Young, S.: The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing 1(3), 195–304 (2007)
Vergin, R., O’Shaughnessy, D., Farhat, A.: Generalized Mel Frequency Cepstral Coefficients for Large-Vocabulary Speaker-Independent Continuous-Speech Recognition. IEEE Trans. on Speech and Audio Processing 7(5), 525–532 (1999)
Hermansky, H.: Perceptually Predictive (PLP) Analysis of Speech. Journal of Acoustic Society of America 87, 1738–1752 (1990)
Hermansky, H., Sharma, S.: Temporal Patterns (TRAPs) in ASR of Noisy Speech. In: Proc. of IEEE Conference on Acoustic Speech and Signal Processing (1999)
Sharma, A., Shrotriya, M.C., Farooq, O., Abbasi, Z.A.: Hybrid Wavelet Based LPC Features for Hindi Speech Recognition. International Journal of Information and Communication Technology, Inderscience publisher 1, 373–381 (2008)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1997)
Deng, L.: Dynamic Speech Models: Theory, Applications, and Algorithms. Morgan and Claypool, San Francisco (2006)
Rao, K.S., Yegnanarayana, B.: Modelling Syllable Duration in Indian Languages Using Neural Networks. In: Proceeding of ICASSP, Montreal, Canada, pp. 313–316 (May 2004)
Guo, G., Li, S.Z.: Content Based Audio Classification and Retrieval by SVMs. IEEE Trans. Neural Networks 14, 209–215 (2003)
Juang, B.-H.: Maximum-Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains. AT & T Tech. Journal 64(6), 1235–1249 (1985)
Juang, B.-H., Katagiri, S.: Discriminative Learning for Minimum Error Classification. IEEE Transactions on Signal Processing 14(4) (1992)
Kumar, N., Andreou, A.G.: Heteroscedastic Disciminant Analysis and Reduced Rank HMMs for Improved Speech Recognition. Speech Communication 26, 283–297 (1998)
Young, S.: A review of Large Vocabulary Continuous Speech Recognition. IEEE Signal Processing Magazine 13, 45–57 (1996)
Huang, X., Acero, A., Hon, H.W.: Spoken Language Processing: A Guide to Theory Algorithm and System Development. Prentice Hall-PTR, New Jersy (April 2001)
Young, S.J., Odell, J.J., Woodland, P.C.: Tree-based State Tying for High Accuracy Acoustic Modeling. In: Proc. of Human Language Technology Workshop, pp. 307–312 (1994)
Digalakis, V.V., Monaco, P., Murveit, H.: Genones; Generalized Mixture Tying in Continuous Hidden Markov Model-Based Speech Recognizers. IEEE Transactions On Speech And Audio Processing 4(4), 281–289 (1996)
Hidden Markov Model Toolkit (HTK-3.4.1), http://htk.eng.cam.ac.uk
Aggarwal, R.K., Dave, M.: Effects of Mixtures in Statistical Modeling of Hindi Speech Recognition System. In: Proceedings of the 2nd International Conference on Intelligent Human Computer Interaction, IIIT, Allahabad, January 16-18, pp. 224–229. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aggarwal, R.K., Dave, M. (2010). Tied Mixture Modeling in Hindi Speech Recognition System. In: Das, V.V., Vijaykumar, R. (eds) Information and Communication Technologies. ICT 2010. Communications in Computer and Information Science, vol 101. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15766-0_86
Download citation
DOI: https://doi.org/10.1007/978-3-642-15766-0_86
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15765-3
Online ISBN: 978-3-642-15766-0
eBook Packages: Computer ScienceComputer Science (R0)