Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization

Tiwari, Manish; Verma, Deepak Kumar

doi:10.1007/s10772-025-10176-2

Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization

Research
Published: 11 March 2025

Volume 28, pages 245–260, (2025)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Manish Tiwari¹ &
Deepak Kumar Verma¹

49 Accesses
Explore all metrics

Abstract

Text-independent speaker recognition is identifying speakers using their voice characteristics, irrespective of the content spoken. This research paper introduces a new method for this type of recognition by combining Mel-frequency cepstral coefficients (MFCCs), bidirectional long short-term memory (Bi-LSTM) networks, and feature optimization based on the Rat Swarm Optimizer (RSO). MFCCs are first extracted from speech signals as the primary feature set, capturing the vital acoustic features of the speaker's voice. To model temporal dependencies and improve speaker discrimination, a Bi-LSTM network is employed, which captures both forward and backward context in sequential data. The performance of the recognition system is further enhanced by optimizing the extracted features using the Rat Swarm Evolutionary Algorithm, a nature-inspired optimization technique that adapts the feature set to enhance accuracy. The outcomes of the research study on benchmark datasets demonstrate the usefulness of the proposed system, as it produces better results than traditional methods. The overall accuracy of speaker identification is 99.02% and the accuracies for gender recognition i.e, for male (96.72%) and female (96.91%) speakers, confirming the model's robustness across different speaker groups. The integration of Bi-LSTM with RSO feature optimization presents a robust and efficient solution for text-independent speaker recognition in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker recognition using Improved Butterfly Optimization Algorithm with hybrid Long Short Term Memory network

Article 13 February 2024

GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals

Article Open access 08 May 2022

Natural Language Processing: Speaker, Language, and Gender Identification with LSTM

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Data availability

No datasets were generated or analysed during the current study.

References

Ali, H., Tran, S. N., Benetos, E., & d’Avila Garcez, A. S. (2018). Speaker recognition with hybrid features from a deep belief network. Neural Computing and Applications, 29, 13–19.
Article Google Scholar
Asha, T., & Murthy, H. A. (2014). The relevance of NIST speaker recognition evaluations. In 2014 International conference on signal processing and communications (SPCOM) (pp. 1–6). IEEE
Atiqul Islam, Md., Jassim, W. A., Cheok, N. S., & Zilany, M. S. A. (2016). A robust speaker identification system using the responses from a model of the auditory periphery. PloS One, 11(7), e0158520.
Article Google Scholar
Campbell, J. P., Shen, W., Campbell, W. M., Schwartz, R., Bonastre, J.-F., & Matrouf, D. (2009). Forensic speaker recognition. IEEE Signal Processing Magazine, 26(2), 95–103.
Article Google Scholar
Cristianini, N. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.
Book Google Scholar
Dhiman, G., Garg, M., Nagar, A., Kumar, V., & Dehghani, M. (2021). A novel algorithm for global optimization: Rat swarm optimizer. Journal of Ambient Intelligence and Humanized Computing, 12, 8457–8482.
Article Google Scholar
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.
Article Google Scholar
Fong, S., Lan, K., & Wong, R. (2013). Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection. BioMed Research International, 2013(1), 720834.
Google Scholar
Gomar, M. G. (2015). System and method for speaker recognition on mobile devices. U.S. Patent 9,042,867
Hmich, A., Badri, A., & Sahel, A. (2011). Automatic speaker identification by using the neural network. In 2011 International conference on multimedia computing and systems, (pp. 1–5). IEEE
Jahangir, R., Teh, Y. W., Memon, N. A., Mujtaba, G., Zareei, M., Ishtiaq, U., Akhtar, M. Z., & Ali, I. (2020). Text-independent speaker identification through feature fusion and deep neural network. IEEE Access, 8, 32187–32202.
Article Google Scholar
Jahangir, R., Teh, Y. W., Nweke, H. F., Mujtaba, G., Al-Garadi, M. A., & Ali, I. (2021). Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications, 171, 114591.
Article Google Scholar
Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236–79263.
Article Google Scholar
Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586–1604.
Article Google Scholar
Ly-Van, B., Blouet, R., Renouard, S., Garcia-Salicetti, S., Dorizzi, B., & Chollet, G. (2003). Signature with text-dependent and text-independent speech for robust identity verification. In Workshop on multimodal user authentication.
Maurya, A., Kumar, D., & Agarwal, R. K. (2018). Speaker recognition for Hindi speech signal using MFCC-GMM approach. Procedia Computer Science, 125, 880–887.
Article Google Scholar
Morrison, G. S., Sahito, F. H., Jardine, G., Djokic, D., Clavet, S., Berghs, S., & Dorny, C. G. (2016). INTERPOL survey of the use of speaker identification by law enforcement agencies. Forensic Science International, 263, 92–100.
Article Google Scholar
Murty, K. S. R., & Yegnanarayana, B. (2005). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.
Article Google Scholar
Nweke, H. F., Teh, Y. W., Al-Garadi, M. A., & Alo, U. R. (2018). Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Systems with Applications, 105, 233–261.
Article Google Scholar
Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). IEEE
Prasad, S., Tan, Z.-H., & Prasad, R. (2017). Frame selection for robust speaker identification: A hybrid approach. Wireless Personal Communications, 97, 933–950.
Article Google Scholar
Schmandt, C., & Arons, B. (1984). A conversational telephone messaging system. IEEE Transactions on Consumer Electronics CE–30(3), 21–24
Selva Nidhyananthan, S., Shantha Selva Kumari, R., & Senthur Selvi, T. (2016). Noise robust speaker identification using RASTA–MFCC feature with quadrilateral filter bank structure. Wireless Personal Communications, 91, 1321–1333.
Article Google Scholar
Soleymanpour, M., & Marvi, H. (2017). Text-independent speaker identification based on selection of the most similar feature vectors. International Journal of Speech Technology, 20, 99–108.
Article Google Scholar
Soong, F. K., Rosenberg, A. E., Juang, B.-H., & Rabiner, L. R. (1987). Report: A vector quantization approach to speaker recognition. AT&T Technical Journal, 66(2), 14–26.
Article Google Scholar
Tiwari, M., & Verma, D. K. (2024). Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques. International Journal of Speech Technology, 27, 1013–1026. https://doi.org/10.1007/s10772-024-10150-4
Article Google Scholar
Wang, W., Zhang, G., Luming Yang, V. S., Balaji, V. E., & Arunkumar, N. (2019). Revisiting signal processing with spectrogram analysis on EEG, ECG and speech signals. Future Generation Computer Systems, 98, 227–232.
Article Google Scholar
Wu, Z., & Cao, Z. (2005). Improved MFCC-based feature for robust speaker identification. Tsinghua Science & Technology, 10(2), 158–161.
Article Google Scholar

Download references

Funding

NA.

Author information

Authors and Affiliations

Chhatrapati Shahu Ji Maharaj University, Kanpur, India
Manish Tiwari & Deepak Kumar Verma

Authors

Manish Tiwari
View author publications
You can also search for this author inPubMed Google Scholar
Deepak Kumar Verma
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Both authors equally contributed to the manuscript.

Corresponding author

Correspondence to Manish Tiwari.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tiwari, M., Verma, D.K. Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization. Int J Speech Technol 28, 245–260 (2025). https://doi.org/10.1007/s10772-025-10176-2

Download citation

Received: 17 October 2024
Accepted: 17 February 2025
Published: 11 March 2025
Issue Date: March 2025
DOI: https://doi.org/10.1007/s10772-025-10176-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Speaker recognition using Improved Butterfly Optimization Algorithm with hybrid Long Short Term Memory network

GRaNN: feature selection with golden ratio-aided neural network for emotion, gender and speaker identification from voice signals

Natural Language Processing: Speaker, Language, and Gender Identification with LSTM

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now