Abstract
This paper presents a voice gender recognition system. Acoustic features and Mel-Frequency Cepstral Coefficients (MFCCs) are extracted to define the speaker's gender. The most used features in these kinds of studies are acoustic features, but in this work, we combined them with MFCCs to test if we will get more satisfactory results. To examine the performance of the proposed system we tried four different databases: the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), the Saarbruecken Voice Database (SVD), the CMU_ARCTIC database and the Amazigh speech database (Self-Created). At the pre-processing stage, we removed the silence from the signals by using Zero-Crossing Rate (ZCR), but we kept the noises. Support Vector Machine (SVM) is used as the classification model. The combination of acoustic features and MFCCs achieves an average accuracy of 90.61% with the RAVDESS database, 92.73% with the SVD database, 99.87% with the CMU_ARCTIC database and 99.95% with the Amazigh speech database.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alkhawaldeh, R.S.: DGR: Gender recognition of human speech using one-dimensional conventional neural network. Sci. Program. (2019). https://doi.org/10.1155/2019/7213717
Ng, C.B., Tay, Y.H., Goi, B.M.: Vision-based human gender recognition: A survey. In: Proceedings of the Computer Vision and Pattern Recognition (2012). https://doi.org/10.48550/arXiv.1204.1611
Archana, G.S., Malleswari, M.: Gender identification and performance analysis of speech signals. In: Proceedings of the 2015 Global Conference on Communication Technologies (GCCT), pp. 483–489. IEEE (2015). https://doi.org/10.1109/GCCT.2015.7342709
Hong, Z.: Speaker gender recognition system, Master's thesis, degree programme in wireless communications engineering. University of Oulu, Oulu, Finland, p. 54 (2017)
Titze, I.R.: Measurements for voice production: Research and clinical applications. J. Acoust. Soc. Am. (1998)
Ahmad, J., Fiaz, M., Kwon, S.I., Sodanil, M., Vo, B., Baik, S.W.: Gender identification using MFCC for telephone applications - a comparative study. Int. J. Comput. Sci. Electron. Eng. 3(5), 351–355 (2015). https://doi.org/10.48550/arXiv.1601.01577
Shareef, M.S., Abd, T., Mezaal, Y.S.: Gender voice classification with huge accuracy rate. Telkomnika 18(5), 2612–2617 (2020). https://doi.org/10.12928/TELKOMNIKA.v18i5.13717
Buyukyilmaz, M., Cibikdiken, A.O.: Voice gender recognition using deep learning. In: Proceedings of the 2016 International Conference on Modeling, Simulation and Optimization Technologies and Applications (MSOTA), vol. 58, pp. 409–411. Atlantis Press (2016)
Ramdinmawii, E., Mittal, V.K.: Gender identification from speech signal by examining the speech production characteristics. In: Proceedings of the 2016 International Conference on Signal processing and Communication (ICSC), pp. 244–249. IEEE (2016). https://doi.org/10.1109/ICSPCom.2016.7980584
Uddin, M.A., Hossain, M.S., Pathan, R.K., Biswas, M.: Gender recognition from human voice using multi-layer architecture. In: Proceedings of the 2020 International Conference on Innovations in Intelligent Systems and Applications (INISTA), pp. 1–7. IEEE (2020). https://doi.org/10.1109/INISTA49547.2020.9194654
Garg, D., Kaur, S., Arora, D.: Comparative analysis of speech processing techniques for gender recognition. Int. J. Adv. Electr. Electron. Eng., 278–283(2012)
Abakarim, F., Abenaou, A.: Amazigh isolated word speech recognition system using the adaptive orthogonal transform method. In: Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), pp. 1–6. IEEE (2020). https://doi.org/10.1109/ISCV49265.2020.9204291
Bachu, R.G., Kopparthi, S., Adapa, B., Barkana, B.D.: Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal. In: Proceedings of the American Society for Engineering Education (ASEE) zone conference proceedings, pp. 1–7 (2008)
Shete, D.S., Patil, S.B., Patil, S.: Zero crossing rate and energy of the speech signal of devanagari script. IOSR J. VLSI and Signal Process. 4(1), 01–05 (2014). https://doi.org/10.9790/4200-04110105
Muda, L., Begam, M., Elamvazuthi, I.: Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques. J. Comput. 2(3), 138–143 (2010)
Noble, W.S.: What is a support vector machine? Nat. Biotechnol. 24(12), 1565–1567 (2006). https://doi.org/10.1038/nbt1206-1565
Fokoue, E., Ma, Z.: Speaker gender recognition via MFCCs and SVMs. Rochester Institute of Technology RIT Scholar Works, pp. 1–9 (2013)
Jena, B., Mohanty, A., Mohanty, S.K.: Gender recognition and classification of speech signal. In: Proceedings of the 2021 International Conference on Smart Data Intelligence (ICSMDI), pp. 1–7. SSRN (2021)
Livingstone, S.R., Russo, F.A.: The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english. PLoS ONE 13(5), e0196391 (2018). https://doi.org/10.1371/journal.pone.0196391
Barry, W.J., Putzer, M.: Saarbruecken voice database. http://www.stimmdatenbank.coli.uni-saarland.de/. Accessed 01 Mar 2022
Kominek, J., Black, A.: The CMU Arctic speech databases for speech synthesis research. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMULTI-03–177 (2003). http://www.festvox.org/cmu_arctic/. Accessed 20 Feb 2022
Bhavan, A., Chauhan, P., Hitkul, Shah, R.R.: Bagged support vector machines for emotion recognition from speech. Knowl.-Based Syst. (2019)
Abakarim, F., Abenaou, A.: Voice pathology detection using the adaptive orthogonal transform method, SVM and MLP. Int. J. Online Biomed. Eng. 17(14), 90–102 (2021)
Livieris, I.E., Pintelas, E., Pintelas, P.: Gender recognition by voice using an improved self-labeled algorithm. Mach. Learn. Knowl. Extr. 1(1), 492–503 (2019). https://doi.org/10.3390/make1010030
Idhssaine, A., El Kirat, Y.: Amazigh language use, perceptions and revitalisation in Morocco: The case of rabat-sale region. J. North Afr. Stud. 26(3), 465–479 (2021). https://doi.org/10.1080/13629387.2019.1690996
Zaid, H., El Allame, Y.E.K.: The place of culture in the Amazigh language textbooks in Morocco. L1-Educ. Stud. Lang. Lit. 18, 1–20 (2018). https://doi.org/10.17239/L1ESLL-2018.18.01.01
Yücesoy, E., Nabiyev, V.V.: A new approach with score-level fusion for the classification of a speaker age and gender. Comput. Electr. Eng. 53, 29–39 (2016). https://doi.org/10.1016/j.compeleceng.2016.06.002
Chaudhary, S., Sharma, D.K.: Gender identification based on voice signal characteristics. In: Proceedings of the 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), pp. 869–874. IEEE (2018). https://doi.org/10.1109/ICACCCN.2018.8748676
Keyvanrad, M.A., Homayounpour, M.M.: Improvement on automatic speaker gender identification using classifier fusion. In: Proceedings of the 2010 18th Iranian Conference on Electrical Engineering, pp. 538–541. IEEE (2010). https://doi.org/10.1109/IRANIANCEE.2010.5507010
Nashipudimath, M.M., Pillai, P., Subramanian, A., Nair, V., Khalife, S.: Voice feature extraction for gender and emotion recognition. In: Proceedings of the ITM Web Conferences, vol. 40, p. 03008. EDP Sciences (2021). https://doi.org/10.1051/itmconf/20214003008
Mohammed, A.A., Al-Irhayim, Y.F.: Speaker age and gender estimation based on deep learning bidirectional long-short term memory (BiLSTM). Tikrit J. Pure Sci. 26(4), 76–84 (2021)
Acknowledgment
A special thanks to Mr. T. Hobson from the Anglosphere English Center for reviewing for spelling and grammatical mistakes. In addition, to all participants who recorded their voices for this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Abakarim, F., Abenaou, A. (2022). Voice Gender Recognition Using Acoustic Features, MFCCs and SVM. In: Gervasi, O., Murgante, B., Hendrix, E.M.T., Taniar, D., Apduhan, B.O. (eds) Computational Science and Its Applications – ICCSA 2022. ICCSA 2022. Lecture Notes in Computer Science, vol 13375. Springer, Cham. https://doi.org/10.1007/978-3-031-10522-7_43
Download citation
DOI: https://doi.org/10.1007/978-3-031-10522-7_43
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-10521-0
Online ISBN: 978-3-031-10522-7
eBook Packages: Computer ScienceComputer Science (R0)