Abstract
Human recognition of isolated vowels is quite robust considering intra and inter-speaker variability. Automatic recognition techniques typically exhibit poor performances, notably in the case of female or child speech because a higher fundamental frequency (F0) generates a sparser sampling of the magnitude spectrum.
In this paper we extend previous results on a perceptually motivated concept of vowel recognition that is based on Perceptual Spectral Clusters (PSC) of harmonic partials. We study the effect of normalizing relevant PSC features by F0 taking as a reference the recognition performance of static features derived from either Linear Prediction (LP) analysis or Mel-Frequency Cepstral Coefficients (MFCC), and using the Mahalanobis distance on a data base of five natural Portuguese vowel sounds uttered by 44 speakers. Test results reveal that the recognition performance of F0-normalized PSC features increases approaching that of MFCC coefficients. These results are significant as PSC related features are amenable to concurrent vowel identification while LP or MFCC-related features are not.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Rabiner, L., Juang, B.H.: Fundamentals of Speech Recognition. Prentice-Hall, Inc., Englewood Cliffs (1993)
Hess, W.: Pitch Determination of Speech Signals -algorithms and devices. Springer, Heidelberg (1983)
Katz, W.F., Assmann, P.F.: Identification of children’s and adults’ vowels: intrinsic fundamental frequency, fundamental frequency dynamics and presence of noise. Journal of Phonetics (29), 23–51 (2001)
Fant, G.: Acoustic Theory of Speech Production. The Hague (1970)
Syrdal, A.K., Gopal, H.S.: A perceptual model of vowel recognition based on the auditory representation of american english vowels. Journal of the Acoustical Society of America 79(4), 1086–1100 (1986)
Liu, C., Eddins, D.A.: Effects of spectral modulation filtering on vowel identification. Journal of the Acoustical Society of America 124(3), 1704–1715 (2008)
Halberstam, B., Raphael, L.J.: Vowel normalization: the role of fundamental frequency and upper formants. Journal of Phonetics (32), 423–434 (2004)
Assman, P.F., Neary, T.M.: Identification of frequency-shifted vowels. Journal of the Acoustical Society of America 124(5), 3203–3212 (2008)
Petterson, G.E., Barney, H.L.: Control methods used in a study of the vowels. Journal of the Acoustical Society of America 24(2), 175–194 (1952)
Johnson, K.: Speaker normalization in speech perception. In: Pironi, D.B., Remez, R.E. (eds.) The handbook of speech perception. Blackwell Publishing Ltd., Malden (2005)
Cheveigné, A., Kawahara, H.: Missing-data model of vowel identification. Journal of the Acoustical Society of America 105(6), 3497–3508 (1999)
Mollis, M.R.: Evaluating models of vowel perception. Journal of the Acoustical Society of America 118(2), 1062–1071 (2005)
Ferreira, A.J.S.: Static features in real-time recognition of isolated vowels at high pitch. Journal of the Acoustical Society of America 112(4), 2389–2404 (2007)
Slawson, A.W.: Vowel quality and musical timbre as functions of spectrum envelope and fundamental frequency. Journal of the Acoustical Society of America 43(1), 87–101 (1968)
Klatt, D.H.: Prediction of perceived phonetic distance from critical-band spectra - a first step. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1278–1281 (1982)
Ferreira, A.J.S.: New signal features for robust identification of isolated vowels. In: 9th European Conference on Speech Communication and Techology (Interspeech 2005), pp. 345–348 (2005)
Moore, B.C.J.: An Introduction to the Psychology of Hearing. Academic Press, London (1989)
Chistovich, L., Lublinskaja, V.: The center of gravity effect in vowel spectra and critical distance between the formants: psychoacoustical study of perception of vowel-like stimuli. Hearing Research 1, 185–195 (1979)
Ferreira, A., Sinha, D.: Accurate and robust frequency estimation in the ODFT domain. In: 2005 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, October 16-19, pp. 203–206 (2005)
Zahorian, S.A., Jagharghi, A.J.: Spectral-shape features versus formants as acoustic correlates for vowels. Journal of the Acoustical Society of America 94(4), 1966–1982 (1993)
Ryalls, J.H., Lieberman, P.: Fundamental frequency and vowel perception. Journal of the Acoustical Society of America 72(5), 1631–1634 (1982)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ferreira, A. (2009). Automatic Recognition of Isolated Vowels Using F0-Normalized Harmonic Features. In: Filipe, J., Obaidat, M.S. (eds) e-Business and Telecommunications. ICETE 2008. Communications in Computer and Information Science, vol 48. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05197-5_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-05197-5_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05196-8
Online ISBN: 978-3-642-05197-5
eBook Packages: Computer ScienceComputer Science (R0)