A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters

Singh, Lavneet; Chetty, Girija

doi:10.1007/978-3-642-29166-1_27

Lavneet Singh⁷ &
Girija Chetty⁷

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 285))

Included in the following conference series:

International Conference on Information Systems, Technology and Management

1241 Accesses
2 Citations

Abstract

Automatic Speech Recognition has been an active topic of research for the past four decades. The main objective of the automatic speech recognition task is to convert a speech segment into an interpretable text message without the need of human intervention. Many different algorithms and schemes based on different mathematical paradigms have been proposed in an attempt to improve recognition rates. Cepstral coefficients play an important part in speech theory and in automatic speech recognition in particular due to their ability to compactly represent relevant information that is contained in a short time sample of a continuous speech signal. The goal of this paper is to discuss comparison of speech parameterization methods: Mel-Frequency Cepstrum Coefficients (MFCC) and improved Mel-Frequency Cepstrum Coefficients (MFCC) using RASTA filters. Thus, in this study, we try to improve the MFCC algorithms to achieve much accuracy reducing the error rates in Automatic Speech Recognition. First, we remove signal correlation through normalization, then we use RASTA filter to filtering the cepstral coefficients. Finally, we reduce dimension of the cepstral coefficients by the variances of cepstral coefficients in different dimension and obtain our features. By using various classifiers, we try to simulate the speech feature extraction at much optimal and least error rate providing robust method for Automatic Speech Recognition (ASRs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Junqua, J.C., Haton, J.P.: Robustness in utomatic Speech Recognition. Kluwer Academic Publishers, Norwell (1996)
Book Google Scholar
Hirsh, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. In: ISCA ITRW ASR 2000, Paris, France (September 2000)
Google Scholar
Saha, S.: The new age electronic patient record system. In: Proceedings of the 1995 Fourteenth Southern Biomedical Engineering Conference, April 7-9, pp. 134–137 (1995)
Google Scholar
Bobbert, D., Wolska, M.: Dialog OS: An Extensible Platform for Teaching Spoken Dialogue Systems. In: Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, Trento, Italy, pp. 159–160 (June 2007)
Google Scholar
Fujita, K., et al.: A New Digital TV Interface Employing Speech Recognition. IEEE Trans. on Consumer Electronics 49(3), 765–769 (2003)
Article Google Scholar
OShaughnessy, D.: Speech Communication. Addison-Wesley Publishing Company (1987)
Google Scholar
Renals, S., et al.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Tran. on Speech and Audio Processing 2(1), Part 11, 161–174 (1994)
Article Google Scholar
Juang, B.H., Rabiner, L.R.: Spectral representations for speech recognition by neural networks-a tutorial. In: Proceedings of the 1992 IEEE-SP Workshop Neural Networks for Signal Processing [1992] II, pp. 214–222 (September 1992)
Google Scholar
Morgan, N., Bourlard, H.A.: Neural Networks for Statistical Recognition of Continuous Speech. Proceedings of the IEEE 83(5), 742–772 (1995)
Article Google Scholar
Shi, M.S., Cheng, Y.M., Pu, X.L.: Probability and Statistics Tutorial, 1st edn., vol. 1, pp. 226–237. Higher Education Press, Beijing (2004)
Google Scholar
Zhao, L.: Speech Signal Processing, 1st edn., vol. 1, pp. 54–55. China Machine Press, Beijing (2003)
Google Scholar
Zhen, B., Wu, X.H., Liu, Z.M., Chi, H.S.: On the importance of Components of the MFCC in speech and speaker recognition. Acta Scientiarum Universitatis Pekinensis 37, 371–378 (2001)
Google Scholar
Wang, W., Liu, F., Wu, S.Z.: A study for the application of RASTA on objective communication speech quality evaluation. Acta Scientiarum Universitatis Pekinensis 39, 697–702 (2003)
Google Scholar
Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Transactions and Audio Processing 2, 578–589 (1994)
Article Google Scholar
Vuuren, S.V., Hermansky, H.: Data-driven design of RASTA-like filters. In: Proceeding EUROSPEECH 1997, Rhodes. Greece, pp. 409–412 (September 1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of ISE, University of Canberra, Australia
Lavneet Singh & Girija Chetty

Authors

Lavneet Singh
View author publications
You can also search for this author in PubMed Google Scholar
Girija Chetty
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science, College of Engineering and Science, Louisiana Tech University, 71272, Ruston, LA, USA
Sumeet Dua
Department of Information Systems, College of Engineering and Information, Technology, UMBC, 1000 Hilltop Circle, 2125, Baltimore, MD, USA
Aryya Gangopadhyay
Department of Computer Science, The University of Manitoba, Winnipeg, MB, Canada
Parimala Thulasiraman
ISTI - CNR, Pisa, Italy
Umberto Straccia
Faculty of Computer Science, Dalhousie University Halifax, B3H 1W5, Nova Scotia, Canada
Michael Shepherd
Faculty of Media: Media Systems, Bauhaus University Weimar, 99421, Weimar, Germany
Benno Stein

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, L., Chetty, G. (2012). A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-29166-1_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29165-4
Online ISBN: 978-3-642-29166-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics