Language-Independent Age Estimation from Speech Using Phonological and Phonemic Features

Haderlein, Tino; Middag, Catherine; Hönig, Florian; Martens, Jean-Pierre; Döllinger, Michael; Schützenberger, Anne; Nöth, Elmar

doi:10.1007/978-3-319-24033-6_19

Tino Haderlein¹⁵,
Catherine Middag¹⁶,
Florian Hönig¹⁵,
Jean-Pierre Martens¹⁶,
Michael Döllinger¹⁷,
Anne Schützenberger¹⁷ &
…
Elmar Nöth¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9302))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1801 Accesses

Abstract

Language-independent and alignment-free phonological and phonemic features were applied for automatic age estimation based on voice and speech properties. 110 persons (average: 75.7 years) read the German version of the text “The North Wind and the Sun”. For comparison with the automatic approach, five listeners estimated the speakers’ age perceptually. Support Vector Regression and feature selection were used to compute the best model of aging. This model was found to use the following features: (a) the percentage of voiced frames, (b) eight phonological features, representing vowel height, nasality in consonants, turbulence, and position of the lips, and finally, (c) seven phonemic features. The latter features might be relevant due to altered articulation because of dentures. The mean absolute error between computed and chronological age was 5.2 years (RMSE: 7.0). It was 7.7 years (RMSE: 9.6) for an optimistic trivial estimator and 10.5 years (RMSE: 11.9) for the average listener.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Rusz, J., Cmejla, R., Ruzickova, H., Ruzicka, E.: Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease. J. Acoust. Soc. Am. 129, 350–367 (2011)
Article Google Scholar
Middag, C., Bocklet, T., Martens, J.-P., Nöth, E.: Combining phonological and acoustic ASR-free features for pathological speech intelligibility assessment. In: Proc. Interspeech, ISCA, pp. 3005–3008 (2011)
Google Scholar
Middag, C.: Automatic Analysis of Pathological Speech. PhD thesis, Ghent University, Ghent, Belgium (2012)
Google Scholar
Haderlein, T., Middag, C., Maier, A., Martens, J.-P., Döllinger, M., Nöth, E.: Visualization of intelligibility measured by language-independent features. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2014. LNCS, vol. 8655, pp. 547–554. Springer, Heidelberg (2014)
Google Scholar
Schneider, S., Plank, C., Eysholdt, U., Schützenberger, A., Rosanowski, F.: Voice Function and Voice-Related Quality of Life in the Elderly. Gerontology 57, 109–114 (2011)
Article Google Scholar
International Phonetic Association (IPA): Handbook of the International Phonetic Association. Cambridge University Press, Cambridge (1999)
Google Scholar
Middag, C., Saeys, Y., Martens, J.-P.: Towards an ASR-free objective analysis of pathological speech. In: Proc. Interspeech, ISCA, pp. 294–297 (2010)
Google Scholar
Moerman, M., Pieters, G., Martens, J.-P., van der Borgt, M.-J., Dejonckere, P.: Objective evaluation of the quality of substitution voices. Eur. Arch. Otorhinolaryngol. 261, 541–547 (2004)
Article Google Scholar
van Immerseel, L., Martens, J.-P.: AMPEX Disordered Voice Analyzer [computer program]. Digital Speech and Signal Processing research group, Ghent University, Ghent, Belgium. http://dssp.elis.ugent.be/downloads-software (last visited May 28, 2015)
van Immerseel, L.M., Martens, J.-P.: Pitch and voiced/unvoiced determination with an auditory model. J. Acoust. Soc. Am. 91, 3511–3526 (1992)
Article Google Scholar
Smola, A.J., Schölkopf, B.: A Tutorial on Support Vector Regression. Statistics and Computing 14, 199–222 (2004)
Article MathSciNet Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar
Harrington, J., Palethorpe, S., Watson, C.I.: Does the Queen speak the Queen’s English? Nature 408, 927–928 (2000)
Article Google Scholar
Watson, P.J., Munson, B.: A comparison of vowel acoustics between older and younger adults. In: Proc. ICPhS XIV, pp. 561–564. International Phonetic Association (2007)
Google Scholar
Harrington, J., Palethorpe, S., Watson, C.I.: Age-related changes in fundamental frequency and formants: a longitudinal study of four speakers. In: Proc. Interspeech, ISCA, pp. 2753–2756 (2007)
Google Scholar
Schötz, S.: Prosodic and non-prosodic cues in human and machine estimation of female and male speaker age. In: Bruce, G., Horne, M. (eds.) Nordic Prosody: Proceedings of the IXth Conference, pp. 215–223. Lund, Sweden (2004)
Google Scholar
Spiegl, W., Stemmer, G., Lasarcyk, E., Kolhatkar, V., Cassidy, A., Potard, B., Shum, S., Song, Y.C., Xu, P., Beyerlein, P., Harnsberger, J., Nöth, E.: Analyzing features for automatic age estimation on cross-sectional data. In: Proc. Interspeech, ISCA, pp. 2923–2926 (2009)
Google Scholar
Minematsu, N., Sekiguchi, M., Hirose, K.: Automatic estimation of perceptual age using speaker modeling techniques. In: Proc. Eurospeech, ISCA, pp. 3005–3008 (2003)
Google Scholar
Bocklet, T., Maier, A., Nöth, E.: Age determination of children in preschool and primary school age with GMM-based supervectors and support vector machines/regression. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2008. LNCS (LNAI), vol. 5246, pp. 253–260. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Lehrstuhl für Mustererkennung (Informatik 5), Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), Martensstraße 3, 91058, Erlangen, Germany
Tino Haderlein, Florian Hönig & Elmar Nöth
Vakgroep voor Elektronica en Informatiesystemen (ELIS), Universiteit Gent, Sint-Pietersnieuwstraat 41, 9000, Gent, Belgium
Catherine Middag & Jean-Pierre Martens
Phoniatrische und pädaudiologische Abteilung in der HNO-Klinik, Klinikum der Universität Erlangen-Nürnberg, Bohlenplatz 21, 91054, Erlangen, Germany
Michael Döllinger & Anne Schützenberger

Authors

Tino Haderlein
View author publications
You can also search for this author in PubMed Google Scholar
Catherine Middag
View author publications
You can also search for this author in PubMed Google Scholar
Florian Hönig
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Pierre Martens
View author publications
You can also search for this author in PubMed Google Scholar
Michael Döllinger
View author publications
You can also search for this author in PubMed Google Scholar
Anne Schützenberger
View author publications
You can also search for this author in PubMed Google Scholar
Elmar Nöth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tino Haderlein .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Pavel Král
University of West Bohemia, Pilsen, Czech Republic
Václav Matoušek

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Haderlein, T. et al. (2015). Language-Independent Age Estimation from Speech Using Phonological and Phonemic Features. In: Král, P., Matoušek, V. (eds) Text, Speech, and Dialogue. TSD 2015. Lecture Notes in Computer Science(), vol 9302. Springer, Cham. https://doi.org/10.1007/978-3-319-24033-6_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-24033-6_19
Published: 11 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24032-9
Online ISBN: 978-3-319-24033-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics