Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

Markitantov, Maxim; Verkholyak, Oxana

doi:10.1007/978-3-030-26061-3_34

Maxim Markitantov¹¹ &
Oxana Verkholyak¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

International Conference on Speech and Computer

1331 Accesses
16 Citations

Abstract

In the given article, we present a novel approach in the paralinguistic field of age and gender recognition by speaker voice based on deep neural networks. The training and testing of proposed models were implemented on the German speech corpus aGender. We conducted experiments using different network topologies, including neural networks with fully-connected and convolutional layers. In a joint recognition of speaker age and gender, our system reached the recognition performance measured as unweighted accuracy of 48.41%. In a separate age and gender recognition setup, the obtained performance was 57.53% and 88.80%, respectively. Applied deep neural networks provide the best result of speaker age recognition in comparison to existing traditional classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ranzato, M., Hinton, G.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2551–2558 (2010)
Google Scholar
Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 873–880 (2007)
Google Scholar
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012)
Article Google Scholar
Deselaers, T., Hasan, S., Bender, O., Ney, H.: A deep learning approach to machine transliteration. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 233–241 (2009)
Google Scholar
Yu, D., Wang, S., Karam, Z., Deng, L.: Language recognition using deep-structured conditional random fields. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5030–5033 (2010)
Google Scholar
Schuller, B., et al.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2794–2797 (2010)
Google Scholar
Burkhardt, F., Eckert, M., Johannsen, W., Stegmann, J.: A database of age and gender annotated telephone speech. In: Proceedings of 7th International Conference on Language Resources and Evaluation (LREC 2010) (2010)
Google Scholar
Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM Multimedia 2010 International Conference, pp. 1459–1462 (2010)
Google Scholar
Kockmann, M., Burget, L., Cernocký, J.: Brno University of Technology system for Interspeech 2010 paralinguistic challenge. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2822–2825 (2010)
Google Scholar
Meinedo, H., Trancoso, I.: Age and gender classification using fusion of acoustic and prosodic features. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2818–2821 (2010)
Google Scholar
Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27(1), 151–167 (2013)
Article Google Scholar
Yücesoy, E., Nabiyev, V.: A new approach with score-level fusion for the classification of a speaker age and gender. Comput. Electr. Eng. 53, 29–39 (2016)
Article Google Scholar
Równicka, J., Kacprzak, S.: Speaker age classification and regression using i-vectors. In: Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016): Understanding Speech Processing in Humans and Machines, pp. 1402–1406 (2016)
Google Scholar
Sadjadi, S., Slaney, M., Heck, L.: MSR identity toolbox v1.0: a Matlab toolbox for speaker-recognition research. Speech Lang. Process. Tech. Committee Newsl. 1, 1–32 (2013)
Google Scholar
Qawaqneh, Z., Abumallouh, A., Barkana, B.: Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl.-Based Syst. 115, 5–14 (2016)
Article Google Scholar
Abumallouh, A., Qawaqneh, Z., Barkana, B.: New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification. In: Neural Computing and Applications, vol. 30, no. 8, pp. 2581–2593 (2017)
Google Scholar
Ghahremani, P., et al.: End-to-end deep neural network age estimation. In: Proceedings of the 19th Annual Conference of the International Speech Communication Association, INTERSPEECH 2018, pp. 277–281 (2018)
Google Scholar
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-Vectors: robust DNN embeddings for speaker recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018)
Google Scholar
Abumallouh, A., Qawaqneh, Z., Barkana, B.: Deep neural network combined posteriors for speakers’ age and gender classification. In: Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CT-IETA), pp. 1–5 (2016)
Google Scholar
McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th python in science conference, pp. 18–24 (2015)
Google Scholar
Paszke, A., et al.: Automatic differentiation in PyTorch (2017)
Google Scholar
Bocklet, T., Stemmer, G., Zeißler, V., Noeth, E.: Age and gender recognition based on multiple systems - early vs. late fusion. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2830–2833 (2010)
Google Scholar
Nguyen, P., Le, T., Tran, D., Huang, X., Sharma, D.: Fuzzy support vector machines for age and gender classification. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2806–2809 (2010)
Google Scholar
Gajsek, R., Žibert, J., Justin, T., Štruc, V., Vesnicer, B., Mihelic, F.: Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2810–2813 (2010)
Google Scholar

Download references

Acknowledgements

This research is supported by the Russian Science Foundation (project No. 18-11-00145).

Author information

Authors and Affiliations

St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences (SPIIRAS), St. Petersburg, Russia
Maxim Markitantov & Oxana Verkholyak

Authors

Maxim Markitantov
View author publications
You can also search for this author in PubMed Google Scholar
Oxana Verkholyak
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maxim Markitantov .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Albert Ali Salah
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Markitantov, M., Verkholyak, O. (2019). Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_34

Download citation

DOI: https://doi.org/10.1007/978-3-030-26061-3_34
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics