Abstract
Hidden Markov Models and Mel Frequency Cepstral Coefficients (MFCC’s) are a sort of standard for Automatic Speech Recognition (ASR) systems, but they fail to capture the nonlinear dynamics of speech that are present in the speech waveforms. The extra information provided by the nonlinear features could be especially useful when training data is scarce, or when the ASR task is very complex. In this work, the Fractal Dimension (FD) of the observed time series is combined with the traditional MFCC’s in the feature vector in order to enhance the performance of two different ASR systems: the first one is a very simple one, with very few training examples, and the second one is a Large Vocabulary Continuous Speech Recognition System for Broadcast News.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Teager, H.M., Teager, S.M.: Evidence for Nonlinear Sound Production Mechanisms in the Vocal Tract. In: Speech Production and Speech Modelling, Bonas, France. NATO Advanced Study Institute Series D, vol. 55 (1989)
Barroso, N., López de Ipiña, K., Ezeiza, A.: Acoustic Phonetic Decoding Oriented to Multilingual Speech Recognition in the Basque Context. Advances in Intelligent and Soft Computing, vol. 71. Springer, Heidelberg (2010)
Faúndez, M., Kubin, G., Kleijn, W.B., Maragos, P., McLaughlin, S., Esposito, A., Hussain, A., Schoentgen, J.: Nonlinear speech processing: overview and applications. Int. J. Control Intelligent Systems 30(1), 1–10 (2002)
Pitsikalis, V., Maragos, P.: Analysis and Classification of Speech Signals by Generalized Fractal Dimension Features. Speech Communication 51(12), 1206–1223 (2009)
Indrebo, K.M., Povinelli, R.J., Johnson, M.T.: Third-Order Moments of Filtered Speech Signals for Robust Speech Recognition. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds.) NOLISP 2005. LNCS (LNAI), vol. 3817, pp. 277–283. Springer, Heidelberg (2006)
Shekofteh, Y., Almasganj, F.: Using Phase Space based processing to extract properfeatures for ASR systems. In: Proceedings of the 5th International Symposium on Telecommunications (2010)
Pickover C.A., Khorasani A.: Fractal characterization of speech waveform graphs. Computers & Graphics (1986)
Martinez, F., Guillamon, A., Martinez, J.J.: Vowel and consonant characterization using fractal dimension in natural speech. In: NOLISP 2003 (2003)
Langi, A., Kinsner, W.: Consonant Characterization Using Correlation Fractal Dimension for Speech Recognition. In: IEEE Wescanex 1995, Communications, Power and Computing, Winnipeg, MB, vol. 1, pp. 208–213 (1995)
Nelwamondo, F.V., Mahola, U., Marwola, T.: Multi-Scale Fractal Dimension for Speaker Identification Systems. WSEAS Transactions on Systems 5(5), 1152–1157 (2006)
Li, Y., Fan, Y., Tong, Q.: Endpoint Detection In Noisy Environment Using Complexity Measure. In: Proceedings of the 2007 International Conference on Wavelet Analysis and Pattern Recognition, Beijing, China (2007)
Chen, X., Zhao, H.: Fractal Characteristic-Based Endpoint Detection for Whispered Speech. In: Proceedings of the 6th WSEAS International Conference on Signal, Speech and Image Processing, Lisbon, Portugal (2006)
Maragos P.: Fractal Aspects of Speech Signals: Dimension and Interpolation. In: Proc. of 1991 International Conference on Acoustics, Speech, and Signal Processing (ICASSP 1991), Toronto, Canada, pp. 417–420 (May 1991)
Maragos, P., Potamianos, A.: Fractal Dimensions of Speech Sounds: Computation and Application to Automatic Speech Recognition. Journal of Acoustical Society of America 105(3), 1925–1932 (1999)
Pitsikalis, V., Kokkinos, I., Maragos, P.: Nonlinear Analysis of Speech Signals: Generalized Dimensions and Lyapunov Exponents. In: Proceedings of Interspeech 2002, Santorini, Greece (2002)
Pitsikalis, V., Maragos, P.: Filtered Dynamics and Fractal Dimensions for Noisy Speech Recognition. IEEE Signal Processing Letters 13(11), 711–714 (2006)
Higuchi, T.: Approach to an irregular time series on the basis of the fractal theory. Physica D 31, 277–283 (1988)
Jang J.S.R.: Audio Signal Processing and Recognition. Available at the links for on-line courses at the author’s homepage, http://www.cs.nthu.edu.tw/~jang
Katz, M.: Fractals and the analysis of waveforms. Comput. Biol. Med. 18(3), 145–156 (1988)
Esteller, R., Vachtsevanos, G., Echauz, J., Litt, B.: A comparison of waveform fractal dimension algorithms. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications 48(2), 177–183 (2001)
Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book 3.4. Cambridge University Press, Cambridge (2006)
Barroso, N., Lopez de Ipiña, K., Ezeiza, A., Hernandez, C., Ezeiza, N., Barroso, O., Susperregi, U., Barroso, S.: GorUp: an ontology-driven Audio Information Retrieval system that suits the requirements of under-resourced languages. In: Proceedings of Interspeech 2011, Firenze (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ezeiza, A., de Ipiña, K.L., Hernández, C., Barroso, N. (2011). Combining Mel Frequency Cepstral Coefficients and Fractal Dimensions for Automatic Speech Recognition. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)