Skip to main content

A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters

  • Conference paper
Information Systems, Technology and Management (ICISTM 2012)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 285))

Abstract

Automatic Speech Recognition has been an active topic of research for the past four decades. The main objective of the automatic speech recognition task is to convert a speech segment into an interpretable text message without the need of human intervention. Many different algorithms and schemes based on different mathematical paradigms have been proposed in an attempt to improve recognition rates. Cepstral coefficients play an important part in speech theory and in automatic speech recognition in particular due to their ability to compactly represent relevant information that is contained in a short time sample of a continuous speech signal. The goal of this paper is to discuss comparison of speech parameterization methods: Mel-Frequency Cepstrum Coefficients (MFCC) and improved Mel-Frequency Cepstrum Coefficients (MFCC) using RASTA filters. Thus, in this study, we try to improve the MFCC algorithms to achieve much accuracy reducing the error rates in Automatic Speech Recognition. First, we remove signal correlation through normalization, then we use RASTA filter to filtering the cepstral coefficients. Finally, we reduce dimension of the cepstral coefficients by the variances of cepstral coefficients in different dimension and obtain our features. By using various classifiers, we try to simulate the speech feature extraction at much optimal and least error rate providing robust method for Automatic Speech Recognition (ASRs).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Junqua, J.C., Haton, J.P.: Robustness in utomatic Speech Recognition. Kluwer Academic Publishers, Norwell (1996)

    Book  Google Scholar 

  2. Hirsh, H.G., Pearce, D.: The AURORA Experimental Framework for the Performance Evaluations of Speech Recognition Systems under Noisy Conditions. In: ISCA ITRW ASR 2000, Paris, France (September 2000)

    Google Scholar 

  3. Saha, S.: The new age electronic patient record system. In: Proceedings of the 1995 Fourteenth Southern Biomedical Engineering Conference, April 7-9, pp. 134–137 (1995)

    Google Scholar 

  4. Bobbert, D., Wolska, M.: Dialog OS: An Extensible Platform for Teaching Spoken Dialogue Systems. In: Decalog 2007: Proceedings of the 11th Workshop on the Semantics and Pragmatics of Dialogue, Trento, Italy, pp. 159–160 (June 2007)

    Google Scholar 

  5. Fujita, K., et al.: A New Digital TV Interface Employing Speech Recognition. IEEE Trans. on Consumer Electronics 49(3), 765–769 (2003)

    Article  Google Scholar 

  6. OShaughnessy, D.: Speech Communication. Addison-Wesley Publishing Company (1987)

    Google Scholar 

  7. Renals, S., et al.: Connectionist Probability Estimators in HMM Speech Recognition. IEEE Tran. on Speech and Audio Processing 2(1), Part 11, 161–174 (1994)

    Article  Google Scholar 

  8. Juang, B.H., Rabiner, L.R.: Spectral representations for speech recognition by neural networks-a tutorial. In: Proceedings of the 1992 IEEE-SP Workshop Neural Networks for Signal Processing [1992] II, pp. 214–222 (September 1992)

    Google Scholar 

  9. Morgan, N., Bourlard, H.A.: Neural Networks for Statistical Recognition of Continuous Speech. Proceedings of the IEEE 83(5), 742–772 (1995)

    Article  Google Scholar 

  10. Shi, M.S., Cheng, Y.M., Pu, X.L.: Probability and Statistics Tutorial, 1st edn., vol. 1, pp. 226–237. Higher Education Press, Beijing (2004)

    Google Scholar 

  11. Zhao, L.: Speech Signal Processing, 1st edn., vol. 1, pp. 54–55. China Machine Press, Beijing (2003)

    Google Scholar 

  12. Zhen, B., Wu, X.H., Liu, Z.M., Chi, H.S.: On the importance of Components of the MFCC in speech and speaker recognition. Acta Scientiarum Universitatis Pekinensis 37, 371–378 (2001)

    Google Scholar 

  13. Wang, W., Liu, F., Wu, S.Z.: A study for the application of RASTA on objective communication speech quality evaluation. Acta Scientiarum Universitatis Pekinensis 39, 697–702 (2003)

    Google Scholar 

  14. Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Transactions and Audio Processing 2, 578–589 (1994)

    Article  Google Scholar 

  15. Vuuren, S.V., Hermansky, H.: Data-driven design of RASTA-like filters. In: Proceeding EUROSPEECH 1997, Rhodes. Greece, pp. 409–412 (September 1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Singh, L., Chetty, G. (2012). A Comparative Study of Recognition of Speech Using Improved MFCC Algorithms and Rasta Filters. In: Dua, S., Gangopadhyay, A., Thulasiraman, P., Straccia, U., Shepherd, M., Stein, B. (eds) Information Systems, Technology and Management. ICISTM 2012. Communications in Computer and Information Science, vol 285. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29166-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-29166-1_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-29165-4

  • Online ISBN: 978-3-642-29166-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics