Skip to main content

Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition

  • Conference paper
Advances in Nonlinear Speech Processing (NOLISP 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

Abstract

Hidden Markov Models and Mel Frequency Cepstral Coefficients (MFCC’s) are a sort of standard for Automatic Speech Recognition (ASR) systems, but they fail to capture the nonlinear dynamics of speech that are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful when training data is scarce, or when the ASR task is very complex. In this work, the Fractal Dimension (FD) of the observed time series is combined with the traditional MFCC’s in the feature vector in order to enhance the performance of two different ASR systems: the first one is a very simple one, with very few training examples, and the second one is a Large Vocabulary Continuous Speech Recognition System for Broadcast News.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Teager, H.M., Teager, S.M.: Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In: Speech Production and Speech Modelling, Bonas, France. NATO Advanced Study Institute Series D, vol. 55 (1989)

    Google Scholar 

  2. Barroso, N., López de Ipiña, K., Ezeiza, A.: Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context. Advances in Intelligent and Soft Computing, vol. 71. Springer, Heidelberg (2010)

    Google Scholar 

  3. Faúndez, M., Kubin, G., Kleijn, W.B., Maragos, P., McLaughlin, S., Esposito, A., Hussain, A., Schoentgen, J.: Nonlinear speech processing: overview and applications. Int. J. Control Intelligent Systems 30(1), 1–10 (2002)

    Google Scholar 

  4. Pitsikalis, V., Maragos, P.: Analysis and Classification of Speech Signals by Generalized Fractal Dimension Features. Speech Communication 51(12), 1206–1223 (2009)

    Article  Google Scholar 

  5. Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 277–283. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Shekofteh, Y., Almasganj, F.: Using Phase Space based processing to extract properfeatures for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications (2010)

    Google Scholar 

  7. Pickover C.A., Khorasani A.: Fractal characterization of speech waveform graphs. Computers & Graphics (1986)

    Google Scholar 

  8. Martinez, F., Guillamon, A., Martinez, J.J.: Vowel and consonant characterization using fractal dimension in natural speech. In: NOLISP 2003 (2003)

    Google Scholar 

  9. Langi, A., Kinsner, W.: Consonant Characterization Using Correlation Fractal Dimension for Speech Recognition. In: IEEE Wescanex 1995, Communications, Power and Computing, Winnipeg, MB, vol. 1, pp. 208–213 (1995)

    Google Scholar 

  10. Nelwamondo, F.V., Mahola, U., Marwola, T.: Multi-Scale Fractal Dimension for Speaker Identification Systems. WSEAS Transactions on Systems 5(5), 1152–1157 (2006)

    Google Scholar 

  11. Li, Y., Fan, Y., Tong, Q.: Endpoint Detection In Noisy Environment Using Complexity Measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China (2007)

    Google Scholar 

  12. Chen, X., Zhao, H.: Fractal Characteristic-Based Endpoint Detection for Whispered Speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal (2006)

    Google Scholar 

  13. Maragos P.: Fractal Aspects of Speech Signals: Dimension and Interpolation. In: Proc. of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), Toronto, Canada, pp. 417–420 (May 1991)

    Google Scholar 

  14. Maragos, P., Potamianos, A.: Fractal Dimensions of Speech Sounds: Computation and Application to Automatic Speech Recognition. Journal of Acoustical Society of America 105(3), 1925–1932 (1999)

    Article  Google Scholar 

  15. Pitsikalis, V., Kokkinos, I., Maragos, P.: Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents. In: Proceedings of Interspeech 2002, Santorini, Greece (2002)

    Google Scholar 

  16. Pitsikalis, V., Maragos, P.: Filtered Dynamics and Fractal Dimensions for Noisy Speech Recognition. IEEE Signal Processing Letters 13(11), 711–714 (2006)

    Article  Google Scholar 

  17. Higuchi, T.: Approach to an irregular time series on the basis of the fractal theory. Physica D 31, 277–283 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  18. Jang J.S.R.: Audio Signal Processing and Recognition. Available at the links for on-line courses at the author’s homepage, http://www.cs.nthu.edu.tw/~jang

  19. Katz, M.: Fractals and the analysis of waveforms. Comput. Biol. Med. 18(3), 145–156 (1988)

    Article  Google Scholar 

  20. Esteller, R., Vachtsevanos, G., Echauz, J., Litt, B.: A comparison of waveform fractal dimension algorithms. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 48(2), 177–183 (2001)

    Article  Google Scholar 

  21. Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book 3.4. Cambridge University Press, Cambridge (2006)

    Google Scholar 

  22. Barroso, N., Lopez de Ipiña, K., Ezeiza, A., Hernandez, C., Ezeiza, N., Barroso, O., Susperregi, U., Barroso, S.: GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech 2011, Firenze (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ezeiza, A., de Ipiña, K.L., Hernández, C., Barroso, N. (2011). Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25020-0_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25019-4

  • Online ISBN: 978-3-642-25020-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics