Voice Gender Recognition Using Acoustic Features, MFCCs and SVM

Abakarim, Fadwa; Abenaou, Abdenbi

doi:10.1007/978-3-031-10522-7_43

Fadwa Abakarim¹² &
Abdenbi Abenaou¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13375))

Included in the following conference series:

International Conference on Computational Science and Its Applications

767 Accesses
2 Citations

Abstract

This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alkhawaldeh, R.S.: DGR: Gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019). https://doi.org/10.1155/2019/7213717
Ng, C.B., Tay, Y.H., Goi, B.M.: Vision-based human gender recognition: A survey. In: Proceedings of the Computer Vision and Pattern Recognition (2012). https://doi.org/10.48550/arXiv.1204.1611
Archana, G.S., Malleswari, M.: Gender identification and performance analysis of speech signals. In: Proceedings of the 2015 Global Conference on Communication Technologies (GCCT), pp. 483–489. IEEE (2015). https://doi.org/10.1109/GCCT.2015.7342709
Hong, Z.: Speaker gender recognition system, Master's thesis, degree programme in wireless communications engineering. University of Oulu, Oulu, Finland, p. 54 (2017)
Google Scholar
Titze, I.R.: Measurements for voice production: Research and clinical applications. J. Acoust. Soc. Am. (1998)
Google Scholar
Ahmad, J., Fiaz, M., Kwon, S.I., Sodanil, M., Vo, B., Baik, S.W.: Gender identification using MFCC for telephone applications - a comparative study. Int. J. Comput. Sci. Electron. Eng. 3(5), 351–355 (2015). https://doi.org/10.48550/arXiv.1601.01577
Article Google Scholar
Shareef, M.S., Abd, T., Mezaal, Y.S.: Gender voice classification with huge accuracy rate. Telkomnika 18(5), 2612–2617 (2020). https://doi.org/10.12928/TELKOMNIKA.v18i5.13717
Article Google Scholar
Buyukyilmaz, M., Cibikdiken, A.O.: Voice gender recognition using deep learning. In: Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA), vol. 58, pp. 409–411. Atlantis Press (2016)
Google Scholar
Ramdinmawii, E., Mittal, V.K.: Gender identification from speech signal by examining the speech production characteristics. In: Proceedings of the 2016 International Conference on Signal processing and Communication (ICSC), pp. 244–249. IEEE (2016). https://doi.org/10.1109/ICSPCom.2016.7980584
Uddin, M.A., Hossain, M.S., Pathan, R.K., Biswas, M.: Gender recognition from human voice using multi-layer architecture. In: Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–7. IEEE (2020). https://doi.org/10.1109/INISTA49547.2020.9194654
Garg, D., Kaur, S., Arora, D.: Comparative analysis of speech processing techniques for gender recognition. Int. J. Adv. Electr. Electron. Eng., 278–283(2012)
Google Scholar
Abakarim, F., Abenaou, A.: Amazigh isolated word speech recognition system using the adaptive orthogonal transform method. In: Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2020). https://doi.org/10.1109/ISCV49265.2020.9204291
Bachu, R.G., Kopparthi, S., Adapa, B., Barkana, B.D.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Proceedings of the American Society for Engineering Education (ASEE) zone conference proceedings, pp. 1–7 (2008)
Google Scholar
Shete, D.S., Patil, S.B., Patil, S.: Zero crossing rate and energy of the speech signal of devanagari script. IOSR J. VLSI and Signal Process. 4(1), 01–05 (2014). https://doi.org/10.9790/4200-04110105
Article Google Scholar
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2(3), 138–143 (2010)
Google Scholar
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006). https://doi.org/10.1038/nbt1206-1565
Article Google Scholar
Fokoue, E., Ma, Z.: Speaker gender recognition via MFCCs and SVMs. Rochester Institute of Technology RIT Scholar Works, pp. 1–9 (2013)
Google Scholar
Jena, B., Mohanty, A., Mohanty, S.K.: Gender recognition and classification of speech signal. In: Proceedings of the 2021 International Conference on Smart Data Intelligence (ICSMDI), pp. 1–7. SSRN (2021)
Google Scholar
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391
Article Google Scholar
Barry, W.J., Putzer, M.: Saarbruecken voice database. http://www.stimmdatenbank.coli.uni-saarland.de/. Accessed 01 Mar 2022
Kominek, J., Black, A.: The CMU Arctic speech databases for speech synthesis research. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMULTI-03–177 (2003). http://www.festvox.org/cmu_arctic/. Accessed 20 Feb 2022
Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. (2019)
Google Scholar
Abakarim, F., Abenaou, A.: Voice pathology detection using the adaptive orthogonal transform method, SVM and MLP. Int. J. Online Biomed. Eng. 17(14), 90–102 (2021)
Article Google Scholar
Livieris, I.E., Pintelas, E., Pintelas, P.: Gender recognition by voice using an improved self-labeled algorithm. Mach. Learn. Knowl. Extr. 1(1), 492–503 (2019). https://doi.org/10.3390/make1010030
Article Google Scholar
Idhssaine, A., El Kirat, Y.: Amazigh language use, perceptions and revitalisation in Morocco: The case of rabat-sale region. J. North Afr. Stud. 26(3), 465–479 (2021). https://doi.org/10.1080/13629387.2019.1690996
Article Google Scholar
Zaid, H., El Allame, Y.E.K.: The place of culture in the Amazigh language textbooks in Morocco. L1-Educ. Stud. Lang. Lit. 18, 1–20 (2018). https://doi.org/10.17239/L1ESLL-2018.18.01.01
Article Google Scholar
Yücesoy, E., Nabiyev, V.V.: A new approach with score-level fusion for the classification of a speaker age and gender. Comput. Electr. Eng. 53, 29–39 (2016). https://doi.org/10.1016/j.compeleceng.2016.06.002
Article Google Scholar
Chaudhary, S., Sharma, D.K.: Gender identification based on voice signal characteristics. In: Proceedings of the 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 869–874. IEEE (2018). https://doi.org/10.1109/ICACCCN.2018.8748676
Keyvanrad, M.A., Homayounpour, M.M.: Improvement on automatic speaker gender identification using classifier fusion. In: Proceedings of the 2010 18th Iranian Conference on Electrical Engineering, pp. 538–541. IEEE (2010). https://doi.org/10.1109/IRANIANCEE.2010.5507010
Nashipudimath, M.M., Pillai, P., Subramanian, A., Nair, V., Khalife, S.: Voice feature extraction for gender and emotion recognition. In: Proceedings of the ITM Web Conferences, vol. 40, p. 03008. EDP Sciences (2021). https://doi.org/10.1051/itmconf/20214003008
Mohammed, A.A., Al-Irhayim, Y.F.: Speaker age and gender estimation based on deep learning bidirectional long-short term memory (BiLSTM). Tikrit J. Pure Sci. 26(4), 76–84 (2021)
Google Scholar

Download references

Acknowledgment

A special thanks to Mr. T. Hobson from the Anglosphere English Center for reviewing for spelling and grammatical mistakes. In addition, to all participants who recorded their voices for this research.

Author information

Authors and Affiliations

Research Team of Applied Mathematics and Intelligent Systems Engineering, National School of Applied Sciences, Ibn Zohr University, 80000, Agadir, Morocco
Fadwa Abakarim & Abdenbi Abenaou

Authors

Fadwa Abakarim
View author publications
You can also search for this author in PubMed Google Scholar
Abdenbi Abenaou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fadwa Abakarim .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Potenza, Italy
Beniamino Murgante
Universidad de Málaga, Malaga, Spain
Eligius M. T. Hendrix
Monash University, Clayton, VIC, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Abakarim, F., Abenaou, A. (2022). Voice Gender Recognition Using Acoustic Features, MFCCs and SVM. In: Gervasi, O., Murgante, B., Hendrix, E.M.T., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2022. ICCSA 2022. Lecture Notes in Computer Science, vol 13375. Springer, Cham. https://doi.org/10.1007/978-3-031-10522-7_43

Download citation

DOI: https://doi.org/10.1007/978-3-031-10522-7_43
Published: 15 July 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10521-0
Online ISBN: 978-3-031-10522-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics