Abstract
Automatic Speech Recognition has been an active topic of research for the past four decades. The main objective of the automatic speech recognition task is to convert a speech segment into an interpretable text message without the need of human intervention. Many different algorithms and schemes based on different mathematical paradigms have been proposed in an attempt to improve recognition rates. Cepstral coefficients play an important part in speech theory and in automatic speech recognition in particular due to their ability to compactly represent relevant information that is contained in a short time sample of a continuous speech signal. The goal of this paper is to discuss comparison of speech parameterization methods: Mel-Frequency Cepstrum Coefficients (MFCC) and improved Mel-Frequency Cepstrum Coefficients (MFCC) using RASTA filters. Thus, in this study, we try to improve the MFCC algorithms to achieve much accuracy reducing the error rates in Automatic Speech Recognition. First, we remove signal correlation through normalization, then we use RASTA filter to filtering the cepstral coefficients. Finally, we reduce dimension of the cepstral coefficients by the variances of cepstral coefficients in different dimension and obtain our features. By using various classifiers, we try to simulate the speech feature extraction at much optimal and least error rate providing robust method for Automatic Speech Recognition (ASRs).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Junqua, J.C., Haton, J.P.: Robustness in utomatic Speech Recognition. Kluwer Academic Publishers, Norwell (1996)
Hirsh, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. In: ISCA ITRW ASR 2000, Paris, France (September 2000)
Saha, S.: The new age electronic patient record system. In: Proceedings of the 1995 Fourteenth Southern Biomedical Engineering Conference, April 7-9, pp. 134–137 (1995)
Bobbert, D., Wolska, M.: Dialog OS: An Extensible Platform for Teaching Spoken Dialogue Systems. In: Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, Trento, Italy, pp. 159–160 (June 2007)
Fujita, K., et al.: A New Digital TV Interface Employing Speech Recognition. IEEE Trans. on Consumer Electronics 49(3), 765–769 (2003)
OShaughnessy, D.: Speech Communication. Addison-Wesley Publishing Company (1987)
Renals, S., et al.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Tran. on Speech and Audio Processing 2(1), Part 11, 161–174 (1994)
Juang, B.H., Rabiner, L.R.: Spectral representations for speech recognition by neural networks-a tutorial. In: Proceedings of the 1992 IEEE-SP Workshop Neural Networks for Signal Processing [1992] II, pp. 214–222 (September 1992)
Morgan, N., Bourlard, H.A.: Neural Networks for Statistical Recognition of Continuous Speech. Proceedings of the IEEE 83(5), 742–772 (1995)
Shi, M.S., Cheng, Y.M., Pu, X.L.: Probability and Statistics Tutorial, 1st edn., vol. 1, pp. 226–237. Higher Education Press, Beijing (2004)
Zhao, L.: Speech Signal Processing, 1st edn., vol. 1, pp. 54–55. China Machine Press, Beijing (2003)
Zhen, B., Wu, X.H., Liu, Z.M., Chi, H.S.: On the importance of Components of the MFCC in speech and speaker recognition. Acta Scientiarum Universitatis Pekinensis 37, 371–378 (2001)
Wang, W., Liu, F., Wu, S.Z.: A study for the application of RASTA on objective communication speech quality evaluation. Acta Scientiarum Universitatis Pekinensis 39, 697–702 (2003)
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Transactions and Audio Processing 2, 578–589 (1994)
Vuuren, S.V., Hermansky, H.: Data-driven design of RASTA-like filters. In: Proceeding EUROSPEECH 1997, Rhodes. Greece, pp. 409–412 (September 1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, L., Chetty, G. (2012). A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-29166-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)