Skip to main content

Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

Abstract

In the given article, we present a novel approach in the paralinguistic field of age and gender recognition by speaker voice based on deep neural networks. The training and testing of proposed models were implemented on the German speech corpus aGender. We conducted experiments using different network topologies, including neural networks with fully-connected and convolutional layers. In a joint recognition of speaker age and gender, our system reached the recognition performance measured as unweighted accuracy of 48.41%. In a separate age and gender recognition setup, the obtained performance was 57.53% and 88.80%, respectively. Applied deep neural networks provide the best result of speaker age recognition in comparison to existing traditional classification methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ranzato, M., Hinton, G.: Modeling pixel means and covariances using factorized third-order Boltzmann machines. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2551–2558 (2010)

    Google Scholar 

  2. Lee, H., Ekanadham, C., Ng, A.: Sparse deep belief net model for visual area V2. In: Proceedings of the 20th International Conference on Neural Information Processing Systems, pp. 873–880 (2007)

    Google Scholar 

  3. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20, 30–42 (2012)

    Article  Google Scholar 

  4. Deselaers, T., Hasan, S., Bender, O., Ney, H.: A deep learning approach to machine transliteration. In: Proceedings of the Fourth Workshop on Statistical Machine Translation, pp. 233–241 (2009)

    Google Scholar 

  5. Yu, D., Wang, S., Karam, Z., Deng, L.: Language recognition using deep-structured conditional random fields. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp. 5030–5033 (2010)

    Google Scholar 

  6. Schuller, B., et al.: The INTERSPEECH 2010 paralinguistic challenge. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2794–2797 (2010)

    Google Scholar 

  7. Burkhardt, F., Eckert, M., Johannsen, W., Stegmann, J.: A database of age and gender annotated telephone speech. In: Proceedings of 7th International Conference on Language Resources and Evaluation (LREC 2010) (2010)

    Google Scholar 

  8. Eyben, F., Wöllmer, M., Schuller, B.: openSMILE - the Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM Multimedia 2010 International Conference, pp. 1459–1462 (2010)

    Google Scholar 

  9. Kockmann, M., Burget, L., Cernocký, J.: Brno University of Technology system for Interspeech 2010 paralinguistic challenge. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2822–2825 (2010)

    Google Scholar 

  10. Meinedo, H., Trancoso, I.: Age and gender classification using fusion of acoustic and prosodic features. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2818–2821 (2010)

    Google Scholar 

  11. Li, M., Han, K., Narayanan, S.: Automatic speaker age and gender recognition using acoustic and prosodic level information fusion. Comput. Speech Lang. 27(1), 151–167 (2013)

    Article  Google Scholar 

  12. Yücesoy, E., Nabiyev, V.: A new approach with score-level fusion for the classification of a speaker age and gender. Comput. Electr. Eng. 53, 29–39 (2016)

    Article  Google Scholar 

  13. Równicka, J., Kacprzak, S.: Speaker age classification and regression using i-vectors. In: Proceedings of the 17th Annual Conference of the International Speech Communication Association (INTERSPEECH 2016): Understanding Speech Processing in Humans and Machines, pp. 1402–1406 (2016)

    Google Scholar 

  14. Sadjadi, S., Slaney, M., Heck, L.: MSR identity toolbox v1.0: a Matlab toolbox for speaker-recognition research. Speech Lang. Process. Tech. Committee Newsl. 1, 1–32 (2013)

    Google Scholar 

  15. Qawaqneh, Z., Abumallouh, A., Barkana, B.: Deep neural network framework and transformed MFCCs for speaker’s age and gender classification. Knowl.-Based Syst. 115, 5–14 (2016)

    Article  Google Scholar 

  16. Abumallouh, A., Qawaqneh, Z., Barkana, B.: New transformed features generated by deep bottleneck extractor and a GMM-UBM classifier for speaker age and gender classification. In: Neural Computing and Applications, vol. 30, no. 8, pp. 2581–2593 (2017)

    Google Scholar 

  17. Ghahremani, P., et al.: End-to-end deep neural network age estimation. In: Proceedings of the 19th Annual Conference of the International Speech Communication Association, INTERSPEECH 2018, pp. 277–281 (2018)

    Google Scholar 

  18. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-Vectors: robust DNN embeddings for speaker recognition. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018)

    Google Scholar 

  19. Abumallouh, A., Qawaqneh, Z., Barkana, B.: Deep neural network combined posteriors for speakers’ age and gender classification. In: Annual Connecticut Conference on Industrial Electronics, Technology & Automation (CT-IETA), pp. 1–5 (2016)

    Google Scholar 

  20. McFee, B., et al.: librosa: audio and music signal analysis in Python. In: Proceedings of the 14th python in science conference, pp. 18–24 (2015)

    Google Scholar 

  21. Paszke, A., et al.: Automatic differentiation in PyTorch (2017)

    Google Scholar 

  22. Bocklet, T., Stemmer, G., Zeißler, V., Noeth, E.: Age and gender recognition based on multiple systems - early vs. late fusion. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2830–2833 (2010)

    Google Scholar 

  23. Nguyen, P., Le, T., Tran, D., Huang, X., Sharma, D.: Fuzzy support vector machines for age and gender classification. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2806–2809 (2010)

    Google Scholar 

  24. Gajsek, R., Žibert, J., Justin, T., Štruc, V., Vesnicer, B., Mihelic, F.: Gender and affect recognition based on GMM and GMM-UBM modeling with relevance MAP estimation. In: Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, pp. 2810–2813 (2010)

    Google Scholar 

Download references

Acknowledgements

This research is supported by the Russian Science Foundation (project No. 18-11-00145).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maxim Markitantov .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Markitantov, M., Verkholyak, O. (2019). Automatic Recognition of Speaker Age and Gender Based on Deep Neural Networks. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26061-3_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26060-6

  • Online ISBN: 978-3-030-26061-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics