Skip to main content

Advertisement

Log in

Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization

  • Research
  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Text-independent speaker recognition is identifying speakers using their voice characteristics, irrespective of the content spoken. This research paper introduces a new method for this type of recognition by combining Mel-frequency cepstral coefficients (MFCCs), bidirectional long short-term memory (Bi-LSTM) networks, and feature optimization based on the Rat Swarm Optimizer (RSO). MFCCs are first extracted from speech signals as the primary feature set, capturing the vital acoustic features of the speaker's voice. To model temporal dependencies and improve speaker discrimination, a Bi-LSTM network is employed, which captures both forward and backward context in sequential data. The performance of the recognition system is further enhanced by optimizing the extracted features using the Rat Swarm Evolutionary Algorithm, a nature-inspired optimization technique that adapts the feature set to enhance accuracy. The outcomes of the research study on benchmark datasets demonstrate the usefulness of the proposed system, as it produces better results than traditional methods. The overall accuracy of speaker identification is 99.02% and the accuracies for gender recognition i.e, for male (96.72%) and female (96.91%) speakers, confirming the model's robustness across different speaker groups. The integration of Bi-LSTM with RSO feature optimization presents a robust and efficient solution for text-independent speaker recognition in real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Data availability

No datasets were generated or analysed during the current study.

References

  • Ali, H., Tran, S. N., Benetos, E., & d’Avila Garcez, A. S. (2018). Speaker recognition with hybrid features from a deep belief network. Neural Computing and Applications, 29, 13–19.

    Article  Google Scholar 

  • Asha, T., & Murthy, H. A. (2014). The relevance of NIST speaker recognition evaluations. In 2014 International conference on signal processing and communications (SPCOM) (pp. 1–6). IEEE

  • Atiqul Islam, Md., Jassim, W. A., Cheok, N. S., & Zilany, M. S. A. (2016). A robust speaker identification system using the responses from a model of the auditory periphery. PloS One, 11(7), e0158520.

    Article  Google Scholar 

  • Campbell, J. P., Shen, W., Campbell, W. M., Schwartz, R., Bonastre, J.-F., & Matrouf, D. (2009). Forensic speaker recognition. IEEE Signal Processing Magazine, 26(2), 95–103.

    Article  Google Scholar 

  • Cristianini, N. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press.

    Book  Google Scholar 

  • Dhiman, G., Garg, M., Nagar, A., Kumar, V., & Dehghani, M. (2021). A novel algorithm for global optimization: Rat swarm optimizer. Journal of Ambient Intelligence and Humanized Computing, 12, 8457–8482.

    Article  Google Scholar 

  • Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55(10), 78–87.

    Article  Google Scholar 

  • Fong, S., Lan, K., & Wong, R. (2013). Classifying human voices by using hybrid SFX time-series preprocessing and ensemble feature selection. BioMed Research International, 2013(1), 720834.

    Google Scholar 

  • Gomar, M. G. (2015). System and method for speaker recognition on mobile devices. U.S. Patent 9,042,867

  • Hmich, A., Badri, A., & Sahel, A. (2011). Automatic speaker identification by using the neural network. In 2011 International conference on multimedia computing and systems, (pp. 1–5). IEEE

  • Jahangir, R., Teh, Y. W., Memon, N. A., Mujtaba, G., Zareei, M., Ishtiaq, U., Akhtar, M. Z., & Ali, I. (2020). Text-independent speaker identification through feature fusion and deep neural network. IEEE Access, 8, 32187–32202.

    Article  Google Scholar 

  • Jahangir, R., Teh, Y. W., Nweke, H. F., Mujtaba, G., Al-Garadi, M. A., & Ali, I. (2021). Speaker identification through artificial intelligence techniques: A comprehensive review and research challenges. Expert Systems with Applications, 171, 114591.

    Article  Google Scholar 

  • Kabir, M. M., Mridha, M. F., Shin, J., Jahan, I., & Ohi, A. Q. (2021). A survey of speaker recognition: Fundamental theories, recognition methods and opportunities. IEEE Access, 9, 79236–79263.

    Article  Google Scholar 

  • Lim, J. S., & Oppenheim, A. V. (1979). Enhancement and bandwidth compression of noisy speech. Proceedings of the IEEE, 67(12), 1586–1604.

    Article  Google Scholar 

  • Ly-Van, B., Blouet, R., Renouard, S., Garcia-Salicetti, S., Dorizzi, B., & Chollet, G. (2003). Signature with text-dependent and text-independent speech for robust identity verification. In Workshop on multimodal user authentication.

  • Maurya, A., Kumar, D., & Agarwal, R. K. (2018). Speaker recognition for Hindi speech signal using MFCC-GMM approach. Procedia Computer Science, 125, 880–887.

    Article  Google Scholar 

  • Morrison, G. S., Sahito, F. H., Jardine, G., Djokic, D., Clavet, S., Berghs, S., & Dorny, C. G. (2016). INTERPOL survey of the use of speaker identification by law enforcement agencies. Forensic Science International, 263, 92–100.

    Article  Google Scholar 

  • Murty, K. S. R., & Yegnanarayana, B. (2005). Combining evidence from residual phase and MFCC features for speaker recognition. IEEE Signal Processing Letters, 13(1), 52–55.

    Article  Google Scholar 

  • Nweke, H. F., Teh, Y. W., Al-Garadi, M. A., & Alo, U. R. (2018). Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges. Expert Systems with Applications, 105, 233–261.

    Article  Google Scholar 

  • Panayotov, V., Chen, G., Povey, D., & Khudanpur, S. (2015). Librispeech: An asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206–5210). IEEE

  • Prasad, S., Tan, Z.-H., & Prasad, R. (2017). Frame selection for robust speaker identification: A hybrid approach. Wireless Personal Communications, 97, 933–950.

    Article  Google Scholar 

  • Schmandt, C., & Arons, B. (1984). A conversational telephone messaging system. IEEE Transactions on Consumer Electronics CE–30(3), 21–24

  • Selva Nidhyananthan, S., Shantha Selva Kumari, R., & Senthur Selvi, T. (2016). Noise robust speaker identification using RASTA–MFCC feature with quadrilateral filter bank structure. Wireless Personal Communications, 91, 1321–1333.

    Article  Google Scholar 

  • Soleymanpour, M., & Marvi, H. (2017). Text-independent speaker identification based on selection of the most similar feature vectors. International Journal of Speech Technology, 20, 99–108.

    Article  Google Scholar 

  • Soong, F. K., Rosenberg, A. E., Juang, B.-H., & Rabiner, L. R. (1987). Report: A vector quantization approach to speaker recognition. AT&T Technical Journal, 66(2), 14–26.

    Article  Google Scholar 

  • Tiwari, M., & Verma, D. K. (2024). Enhanced text-independent speaker recognition using MFCC, Bi-LSTM, and CNN-based noise removal techniques. International Journal of Speech Technology, 27, 1013–1026. https://doi.org/10.1007/s10772-024-10150-4

    Article  Google Scholar 

  • Wang, W., Zhang, G., Luming Yang, V. S., Balaji, V. E., & Arunkumar, N. (2019). Revisiting signal processing with spectrogram analysis on EEG, ECG and speech signals. Future Generation Computer Systems, 98, 227–232.

    Article  Google Scholar 

  • Wu, Z., & Cao, Z. (2005). Improved MFCC-based feature for robust speaker identification. Tsinghua Science & Technology, 10(2), 158–161.

    Article  Google Scholar 

Download references

Funding

NA.

Author information

Authors and Affiliations

Authors

Contributions

Both authors equally contributed to the manuscript.

Corresponding author

Correspondence to Manish Tiwari.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tiwari, M., Verma, D.K. Gender recognition in text-independent speaker identification using MFCC, spectrogram, Bi-LSTM, and rat swarm evolutionary algorithm optimization. Int J Speech Technol 28, 245–260 (2025). https://doi.org/10.1007/s10772-025-10176-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-025-10176-2

Keywords