Skip to main content

Voice Gender Recognition Using Acoustic Features, MFCCs and SVM

  • Conference paper
  • First Online:
Computational Science and Its Applications – ICCSA 2022 (ICCSA 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13375))

Included in the following conference series:

Abstract

This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alkhawaldeh, R.S.: DGR: Gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019). https://doi.org/10.1155/2019/7213717

  2. Ng, C.B., Tay, Y.H., Goi, B.M.: Vision-based human gender recognition: A survey. In: Proceedings of the Computer Vision and Pattern Recognition (2012). https://doi.org/10.48550/arXiv.1204.1611

  3. Archana, G.S., Malleswari, M.: Gender identification and performance analysis of speech signals. In: Proceedings of the 2015 Global Conference on Communication Technologies (GCCT), pp. 483–489. IEEE (2015). https://doi.org/10.1109/GCCT.2015.7342709

  4. Hong, Z.: Speaker gender recognition system, Master's thesis, degree programme in wireless communications engineering. University of Oulu, Oulu, Finland, p. 54 (2017)

    Google Scholar 

  5. Titze, I.R.: Measurements for voice production: Research and clinical applications. J. Acoust. Soc. Am. (1998)

    Google Scholar 

  6. Ahmad, J., Fiaz, M., Kwon, S.I., Sodanil, M., Vo, B., Baik, S.W.: Gender identification using MFCC for telephone applications - a comparative study. Int. J. Comput. Sci. Electron. Eng. 3(5), 351–355 (2015). https://doi.org/10.48550/arXiv.1601.01577

    Article  Google Scholar 

  7. Shareef, M.S., Abd, T., Mezaal, Y.S.: Gender voice classification with huge accuracy rate. Telkomnika 18(5), 2612–2617 (2020). https://doi.org/10.12928/TELKOMNIKA.v18i5.13717

    Article  Google Scholar 

  8. Buyukyilmaz, M., Cibikdiken, A.O.: Voice gender recognition using deep learning. In: Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA), vol. 58, pp. 409–411. Atlantis Press (2016)

    Google Scholar 

  9. Ramdinmawii, E., Mittal, V.K.: Gender identification from speech signal by examining the speech production characteristics. In: Proceedings of the 2016 International Conference on Signal processing and Communication (ICSC), pp. 244–249. IEEE (2016). https://doi.org/10.1109/ICSPCom.2016.7980584

  10. Uddin, M.A., Hossain, M.S., Pathan, R.K., Biswas, M.: Gender recognition from human voice using multi-layer architecture. In: Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–7. IEEE (2020). https://doi.org/10.1109/INISTA49547.2020.9194654

  11. Garg, D., Kaur, S., Arora, D.: Comparative analysis of speech processing techniques for gender recognition. Int. J. Adv. Electr. Electron. Eng., 278–283(2012)

    Google Scholar 

  12. Abakarim, F., Abenaou, A.: Amazigh isolated word speech recognition system using the adaptive orthogonal transform method. In: Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2020). https://doi.org/10.1109/ISCV49265.2020.9204291

  13. Bachu, R.G., Kopparthi, S., Adapa, B., Barkana, B.D.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Proceedings of the American Society for Engineering Education (ASEE) zone conference proceedings, pp. 1–7 (2008)

    Google Scholar 

  14. Shete, D.S., Patil, S.B., Patil, S.: Zero crossing rate and energy of the speech signal of devanagari script. IOSR J. VLSI and Signal Process. 4(1), 01–05 (2014). https://doi.org/10.9790/4200-04110105

    Article  Google Scholar 

  15. Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2(3), 138–143 (2010)

    Google Scholar 

  16. Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006). https://doi.org/10.1038/nbt1206-1565

    Article  Google Scholar 

  17. Fokoue, E., Ma, Z.: Speaker gender recognition via MFCCs and SVMs. Rochester Institute of Technology RIT Scholar Works, pp. 1–9 (2013)

    Google Scholar 

  18. Jena, B., Mohanty, A., Mohanty, S.K.: Gender recognition and classification of speech signal. In: Proceedings of the 2021 International Conference on Smart Data Intelligence (ICSMDI), pp. 1–7. SSRN (2021)

    Google Scholar 

  19. Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391

    Article  Google Scholar 

  20. Barry, W.J., Putzer, M.: Saarbruecken voice database. http://www.stimmdatenbank.coli.uni-saarland.de/. Accessed 01 Mar 2022

  21. Kominek, J., Black, A.: The CMU Arctic speech databases for speech synthesis research. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMULTI-03–177 (2003). http://www.festvox.org/cmu_arctic/. Accessed 20 Feb 2022

  22. Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. (2019)

    Google Scholar 

  23. Abakarim, F., Abenaou, A.: Voice pathology detection using the adaptive orthogonal transform method, SVM and MLP. Int. J. Online Biomed. Eng. 17(14), 90–102 (2021)

    Article  Google Scholar 

  24. Livieris, I.E., Pintelas, E., Pintelas, P.: Gender recognition by voice using an improved self-labeled algorithm. Mach. Learn. Knowl. Extr. 1(1), 492–503 (2019). https://doi.org/10.3390/make1010030

    Article  Google Scholar 

  25. Idhssaine, A., El Kirat, Y.: Amazigh language use, perceptions and revitalisation in Morocco: The case of rabat-sale region. J. North Afr. Stud. 26(3), 465–479 (2021). https://doi.org/10.1080/13629387.2019.1690996

    Article  Google Scholar 

  26. Zaid, H., El Allame, Y.E.K.: The place of culture in the Amazigh language textbooks in Morocco. L1-Educ. Stud. Lang. Lit. 18, 1–20 (2018). https://doi.org/10.17239/L1ESLL-2018.18.01.01

    Article  Google Scholar 

  27. Yücesoy, E., Nabiyev, V.V.: A new approach with score-level fusion for the classification of a speaker age and gender. Comput. Electr. Eng. 53, 29–39 (2016). https://doi.org/10.1016/j.compeleceng.2016.06.002

    Article  Google Scholar 

  28. Chaudhary, S., Sharma, D.K.: Gender identification based on voice signal characteristics. In: Proceedings of the 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 869–874. IEEE (2018). https://doi.org/10.1109/ICACCCN.2018.8748676

  29. Keyvanrad, M.A., Homayounpour, M.M.: Improvement on automatic speaker gender identification using classifier fusion. In: Proceedings of the 2010 18th Iranian Conference on Electrical Engineering, pp. 538–541. IEEE (2010). https://doi.org/10.1109/IRANIANCEE.2010.5507010

  30. Nashipudimath, M.M., Pillai, P., Subramanian, A., Nair, V., Khalife, S.: Voice feature extraction for gender and emotion recognition. In: Proceedings of the ITM Web Conferences, vol. 40, p. 03008. EDP Sciences (2021). https://doi.org/10.1051/itmconf/20214003008

  31. Mohammed, A.A., Al-Irhayim, Y.F.: Speaker age and gender estimation based on deep learning bidirectional long-short term memory (BiLSTM). Tikrit J. Pure Sci. 26(4), 76–84 (2021)

    Google Scholar 

Download references

Acknowledgment

A special thanks to Mr. T. Hobson from the Anglosphere English Center for reviewing for spelling and grammatical mistakes. In addition, to all participants who recorded their voices for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fadwa Abakarim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Abakarim, F., Abenaou, A. (2022). Voice Gender Recognition Using Acoustic Features, MFCCs and SVM. In: Gervasi, O., Murgante, B., Hendrix, E.M.T., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2022. ICCSA 2022. Lecture Notes in Computer Science, vol 13375. Springer, Cham. https://doi.org/10.1007/978-3-031-10522-7_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-10522-7_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-10521-0

  • Online ISBN: 978-3-031-10522-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics