Audio Features Selection for Automatic Height Estimation from Speech

Ganchev, Todor; Mporas, Iosif; Fakotakis, Nikos

doi:10.1007/978-3-642-12842-4_12

Audio Features Selection for Automatic Height Estimation from Speech

Todor Ganchev²¹,
Iosif Mporas²¹ &
Nikos Fakotakis²¹

Conference paper

2164 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6040))

Abstract

Aiming at the automatic estimation of the height of a person from speech, we investigate the applicability of various subsets of speech features, which were formed on the basis of ranking the relevance and the individual quality of numerous audio features. Specifically, based on the relevance ranking of the large set of openSMILE audio descriptors, we performed selection of subsets with different sizes and evaluated them on the height estimation task. In brief, during the speech parameterization process, every input utterance is converted to a single feature vector, which consists of 6552 parameters. Next, a subset of this feature vector is fed to a support vector machine (SVM)-based regression model, which aims at the straight estimation of the height of an unknown speaker. The experimental evaluation performed on the TIMIT database demonstrated that: (i) the feature vector composed of the top-50 ranked parameters provides a good trade-off between computational demands and accuracy, and that (ii) the best accuracy, in terms of mean absolute error and root mean square error, is observed for the top-200 subset.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fitch, W.T., Giedd, J.: Morphology and development of human vocal tract: a study using magnetic resonance imaging. Journal of Acoustical Society of America 106(3), 1511–1522 (1999)
Article Google Scholar
van Dommelen, W.A., Moxness, B.H.: Acoustic parameters in speaker height and weight identification: sex-specific behaviour. Language and Speech 38, 267–287 (1995)
Google Scholar
van Oostendorp, M.: Schwa in phonological theory. GLOT International 3, 3–8 (1998)
Google Scholar
Collins, S.A.: Men’s voices and women’s choices. Animal Behaviour 60, 773–780 (2000)
Article Google Scholar
Gonzalez, J.: Estimation of speaker’s weight and height from speech: a re-analysis of data from multiple studies by Lass and colleagues. Perceptual and Motor Skills 96, 297–304 (2003)
Article Google Scholar
Rendall, D., Kollias, S., Ney, C.: Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: the role of vocalizer body size and voice-acoustic allometry. Journal of Acoustical Society of America 117(2), 1–12 (2005)
Article Google Scholar
Lass, N.J., Brown, W.S.: Correlation study of speaker’s heights, weights, body surface areas, and speaking fundamental frequencies. Journal of Acoustical Society of America 63(4), 700–703 (1978)
Article Google Scholar
Künzel, H.J.: How well does average fundamental frequency correlate with speaker height and weight? Phonetica 46, 117–125 (1989)
Article Google Scholar
Smith, D.R.R., Patterson, R.D., Turner, R., Kawahara, H., Irino, T.: The processing and perception of size information in speech sounds. Journal of Acoustical Society of America 117(1), 305–318 (2005)
Article Google Scholar
Dusan, S.: Estimation of speaker’s height and vocal tract length from speech signal. In: Proc. of the 9th European Conference on Speech Communication and Technology (Interspeech 2005), pp. 1989–1992 (2005)
Google Scholar
Fant, G.: Acoustic Theory of Speech Production. Mouton, The Hague (1960)
Google Scholar
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing 28(4), 357–366 (1980)
Article Google Scholar
Eyben, F., Wöllmer, M., Schüller, B.: openEAR – introducing the Munich open-source emotion and affect recognition toolkit. In: Proc. 4th International HUMAINE Association Conference on Affective Computing and Intelligent Interaction 2009 (ACII 2009), September 10-12. IEEE, Amsterdam (2009)
Google Scholar
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P.: The HTK Book (for HTK Version 3.4). Cambridge University Engineering Department, Cambridge (2006)
Google Scholar
Robnik-Šikonja, M., Kononenko, I.: An adaptation of Relief for attribute estimation in regression. In: Fourteenth International Conference on Machine Learning, pp. 296–304 (1997)
Google Scholar
Scholkopf, B., Smola, A., Williamson, R., Bartlett, P.L.: New support vector algorithms. Neural Computation 12(5), 1207–1245 (2000)
Article Google Scholar
Garofolo, J.: Getting started with the DARPA-TIMIT CD-ROM: an acoustic phonetic continuous speech database. National Institute of Standards and Technology (NIST), Gaithersburgh, MD, USA (1988)
Google Scholar
Pellom, B.L., Hansen, J.H.L.: Voice analysis in adverse conditions: the centennial Olympic park bombing 911 call. In: Proc. of the 40th Midwest Symposium on Circuits and Systems (MWSCAS 1997), vol. 2, pp. 873–876 (1997)
Google Scholar
Witten, H.I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann Publishing, San Francisco (2005)
MATH Google Scholar
Theodoridis, S., Koutroumbas, K.: Pattern Recognition, 4th edn. Academic Press, London (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Artificial Intelligence Group, Wire Communications Laboratory, Dept. of Electrical and Computer Engineering, University of Patras, 26500, Rion-Patras, Greece
Todor Ganchev, Iosif Mporas & Nikos Fakotakis

Authors

Todor Ganchev
View author publications
You can also search for this author in PubMed Google Scholar
Iosif Mporas
View author publications
You can also search for this author in PubMed Google Scholar
Nikos Fakotakis
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Informatics and Telecommunications, NCSR Demokritos, Ag. Paraskevi, 15310, Athens, Greece
Stasinos Konstantopoulos , Stavros Perantonis , Vangelis Karkaletsis & Constantine D. Spyropoulos , , &
Department of Information and Communication Systems Engineering, University of the Aegean, 83200, Karlovassi, Samos, Greece
George Vouros

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ganchev, T., Mporas, I., Fakotakis, N. (2010). Audio Features Selection for Automatic Height Estimation from Speech. In: Konstantopoulos, S., Perantonis, S., Karkaletsis, V., Spyropoulos, C.D., Vouros, G. (eds) Artificial Intelligence: Theories, Models and Applications. SETN 2010. Lecture Notes in Computer Science(), vol 6040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12842-4_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-12842-4_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12841-7
Online ISBN: 978-3-642-12842-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics