Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition

Ezeiza, Aitzol; de Ipiña, Karmele López; Hernández, Carmen; Barroso, Nora

doi:10.1007/978-3-642-25020-0_24

Aitzol Ezeiza²⁰,
Karmele López de Ipiña²⁰,
Carmen Hernández²⁰ &
…
Nora Barroso²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

1044 Accesses
3 Citations

Abstract

Hidden Markov Models and Mel Frequency Cepstral Coefficients (MFCC’s) are a sort of standard for Automatic Speech Recognition (ASR) systems, but they fail to capture the nonlinear dynamics of speech that are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful when training data is scarce, or when the ASR task is very complex. In this work, the Fractal Dimension (FD) of the observed time series is combined with the traditional MFCC’s in the feature vector in order to enhance the performance of two different ASR systems: the first one is a very simple one, with very few training examples, and the second one is a Large Vocabulary Continuous Speech Recognition System for Broadcast News.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Teager, H.M., Teager, S.M.: Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In: Speech Production and Speech Modelling, Bonas, France. NATO Advanced Study Institute Series D, vol. 55 (1989)
Google Scholar
Barroso, N., López de Ipiña, K., Ezeiza, A.: Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context. Advances in Intelligent and Soft Computing, vol. 71. Springer, Heidelberg (2010)
Google Scholar
Faúndez, M., Kubin, G., Kleijn, W.B., Maragos, P., McLaughlin, S., Esposito, A., Hussain, A., Schoentgen, J.: Nonlinear speech processing: overview and applications. Int. J. Control Intelligent Systems 30(1), 1–10 (2002)
Google Scholar
Pitsikalis, V., Maragos, P.: Analysis and Classification of Speech Signals by Generalized Fractal Dimension Features. Speech Communication 51(12), 1206–1223 (2009)
Article Google Scholar
Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 277–283. Springer, Heidelberg (2006)
Chapter Google Scholar
Shekofteh, Y., Almasganj, F.: Using Phase Space based processing to extract properfeatures for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications (2010)
Google Scholar
Pickover C.A., Khorasani A.: Fractal characterization of speech waveform graphs. Computers & Graphics (1986)
Google Scholar
Martinez, F., Guillamon, A., Martinez, J.J.: Vowel and consonant characterization using fractal dimension in natural speech. In: NOLISP 2003 (2003)
Google Scholar
Langi, A., Kinsner, W.: Consonant Characterization Using Correlation Fractal Dimension for Speech Recognition. In: IEEE Wescanex 1995, Communications, Power and Computing, Winnipeg, MB, vol. 1, pp. 208–213 (1995)
Google Scholar
Nelwamondo, F.V., Mahola, U., Marwola, T.: Multi-Scale Fractal Dimension for Speaker Identification Systems. WSEAS Transactions on Systems 5(5), 1152–1157 (2006)
Google Scholar
Li, Y., Fan, Y., Tong, Q.: Endpoint Detection In Noisy Environment Using Complexity Measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China (2007)
Google Scholar
Chen, X., Zhao, H.: Fractal Characteristic-Based Endpoint Detection for Whispered Speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal (2006)
Google Scholar
Maragos P.: Fractal Aspects of Speech Signals: Dimension and Interpolation. In: Proc. of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), Toronto, Canada, pp. 417–420 (May 1991)
Google Scholar
Maragos, P., Potamianos, A.: Fractal Dimensions of Speech Sounds: Computation and Application to Automatic Speech Recognition. Journal of Acoustical Society of America 105(3), 1925–1932 (1999)
Article Google Scholar
Pitsikalis, V., Kokkinos, I., Maragos, P.: Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents. In: Proceedings of Interspeech 2002, Santorini, Greece (2002)
Google Scholar
Pitsikalis, V., Maragos, P.: Filtered Dynamics and Fractal Dimensions for Noisy Speech Recognition. IEEE Signal Processing Letters 13(11), 711–714 (2006)
Article Google Scholar
Higuchi, T.: Approach to an irregular time series on the basis of the fractal theory. Physica D 31, 277–283 (1988)
Article MathSciNet MATH Google Scholar
Jang J.S.R.: Audio Signal Processing and Recognition. Available at the links for on-line courses at the author’s homepage, http://www.cs.nthu.edu.tw/~jang
Katz, M.: Fractals and the analysis of waveforms. Comput. Biol. Med. 18(3), 145–156 (1988)
Article Google Scholar
Esteller, R., Vachtsevanos, G., Echauz, J., Litt, B.: A comparison of waveform fractal dimension algorithms. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 48(2), 177–183 (2001)
Article Google Scholar
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book 3.4. Cambridge University Press, Cambridge (2006)
Google Scholar
Barroso, N., Lopez de Ipiña, K., Ezeiza, A., Hernandez, C., Ezeiza, N., Barroso, O., Susperregi, U., Barroso, S.: GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech 2011, Firenze (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of System Engineering and Automation, University of the Basque Country, Spain
Aitzol Ezeiza, Karmele López de Ipiña & Carmen Hernández
Irunweb Enterprise, Auzolan 2B – 2, Irun, Spain
Nora Barroso

Authors

Aitzol Ezeiza
View author publications
You can also search for this author in PubMed Google Scholar
Karmele López de Ipiña
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Hernández
View author publications
You can also search for this author in PubMed Google Scholar
Nora Barroso
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Technological Development and Innovation in Communications (IDETIC), Signals and Communications Department, University of Las Palmas de Gran Canaria, Campus de Tafira, s/n, 35017, Las Palmas de Gran Canaria, Spain
Carlos M. Travieso-González & Jesús B. Alonso-Hernández &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ezeiza, A., de Ipiña, K.L., Hernández, C., Barroso, N. (2011). Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-642-25020-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics