Abstract
Gender detection from running speech is a very important objective to improve efficiency in tasks as speech or speaker recognition, among others. Traditionally gender detection has been focused on fundamental frequency (f0) and cepstral features derived from voiced segments of speech. The methodology presented here discards f0 as a valid feature because its estimation is complicate, or even impossible in unvoiced fragments, and its relevance in emotional speech or in strongly prosodic speech is not reliable. The approach followed consists in obtaining uncorrelated glottal and vocal tract components which are parameterized as mel-frequency coefficients. K-fold and cross-validation using QDA and GMM classifiers showed detection rates as large as 99.77 in a gender-balanced database of running speech from 340 speakers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Fraile, R., Saenz-Lechon, N., Godino-Llorente, J.I., Osma-Ruiz, V., Fredouille, C.: Automatic detection of laryngeal pathologies in records of sustained vowels by means of mel-frequency cepstral coefficient parameters and differentiation of patients by sex. Folia Phoniatrica et Logopaedica 61, 146–152 (2009)
Wu, K., Childers, D.G.: Gender recognition from speech. Part I: Coarse analysis. J. Acoust. Soc. Am. 90(4), 1828–1840 (1990)
Childers, D.G., Wu, K.: Gender recognition from speech. Part II: Fine analysis. J. Acoust. Soc. Am. 90(4), 1841–1856 (1991)
Sorokin, V.N., Makarov, I.S.: Gender recognition from vocal source. Acoust. Phys. 54(4), 571–578 (2009)
Gómez, P., Fernández, R., Rodellar, V., Nieto, V., Álvarez, A., Mazaira, L.M., Martínez, R., Godino, J.I.: Glottal Source Biometrical Signature for Voice Pathology Detection. Speech Comm. 51, 759–781 (2009)
Fant, G.: Acoustic theory of speech production. Walter de Gruyter (1970)
Titze, I.: Principles of voice production. Prentice Hall, Englewood Cliffs (1994)
Manolakis, D., Ingle, V.K., Kogon, S.M.: Statistical and Adaptive Signal Processing. Artech House (2005)
Prasanna, S.R.M., Gudpa, C.S., Yegnanarayana, B.: Extraction of speaker-specific excitation information from linear prediction residual of speech. Speech Communication 48, 1243–1261 (2006)
Moreno, A., Poch, D., Bonafonte, A., Lleida, E., Llisterri, J., Mariño, J.B., Nadeu, C.: Albayzin Speech Database: Design of the Phonetic Corpus. In: Proc. Eurospeech 1993, vol. 1, pp. 653–656 (1993)
Reynolds, D., Rose, R.: Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans. SAP 3(1), 72–83 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Muñoz-Mulas, C., Martínez-Olalla, R., Gómez-Vilda, P., Álvarez-Marquina, A., Mazaira-Fernández, L.M. (2013). Gender Detection in Running Speech from Glottal and Vocal Tract Correlates. In: Drugman, T., Dutoit, T. (eds) Advances in Nonlinear Speech Processing. NOLISP 2013. Lecture Notes in Computer Science(), vol 7911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38847-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-642-38847-7_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38846-0
Online ISBN: 978-3-642-38847-7
eBook Packages: Computer ScienceComputer Science (R0)