Skip to main content

Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model

  • Conference paper
  • First Online:
Pan-African Conference on Artificial Intelligence (PanAfriConAI 2023)

Abstract

Speaker identification is a biometric mechanism that determines a person who is speaking from a set of known speakers. It has vital applications in areas like security, surveillance, forensic investigations, and others. The accuracy of speaker identification systems was good by using clean speech. However, the speaker identification system performance gets degraded under noisy and mismatched conditions. Recently, a network of hybrid convolutional neural networks (CNN) and enhanced recurrent neural network (RNN) variants have performed better in speech recognition, image classification, and other pattern recognition. Moreover, cochleogram features have shown better accuracy in speech and speaker recognition under noisy conditions. However, there is no attempt conducted in speaker recognition using hybrid CNN and enhanced RNN variants with the cochleogram input to enhance the models’ accuracy in noisy environments. This study proposes a speaker identification for noisy conditions using a hybrid CNN and bidirectional gated recurrent unit (BiGRU) network on the cochleogram input. The models were evaluated by using the VoxCeleb1 speech dataset with real-world noise, white Gaussian noises (WGN), and without additive noise. Real-world noises and WGN were added to the dataset at the signal-to-noise ratio (SNR) of −5 dB up to 20 dB with 5 dB intervals. The proposed model attained an accuracy of 93.15%, 97.55%, and 98.60% on the dataset with real-world noises at SNR of −5 dB, 10 dB, and 20 dB, respectively. The proposed model shows approximately similar performance on both real-world noise and WGN at similar SNR levels. Using the dataset without additive noise the model achieved 98.85% accuracy. The evaluation accuracy and the comparison with the previous works indicate that our model has better accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Kinnunen, T., Li, H.: An overview of text independent speaker recognition from features to super-vectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  2. Ahmed, S., Mamun, N., Hossain, A.: Cochleagram based speaker identification using noise adapted CNN. In: 5th International Conference on Electrical Engineering, Information and Communication Technology (ICEEICT) (2021)

    Google Scholar 

  3. Gustavo, A.: Modeling prosodic differences for speaker recognition. Speech Commun. 49(4), 77–291 (2007)

    Google Scholar 

  4. Selvan, K., Joseph, A., Babu, A.: Speaker recognition system for security applications. In: IEEE Recent Advances in Intelligent Computational Systems (RAICS), Trivandrum, India (2013)

    Google Scholar 

  5. Han, K., Omar, M., Pelecanos, J., Pendus, C., Yaman, S., Zhu, W.: Forensically inspired approaches to automatic speaker recognition. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Prague, Czech Republic (2011)

    Google Scholar 

  6. Alegre, F., Soldi, G., Evans, N., Fauve, B., Liu, J.: Evasion and obfuscation in speaker recognition surveillance and forensics. In: IEEE 2nd International Workshop on Biometrics and Forensics, Valletta, Malta (2014)

    Google Scholar 

  7. Singh, N., Khan, R.A., Shree, R.: Applications of speaker recognition. Procedia Eng. 38, 3122–3126 (2012)

    Article  Google Scholar 

  8. Li, L., et al.: CN-Celeb: multi-genre speaker recognition. Speech Commun. 137, 77–91 (2022)

    Article  Google Scholar 

  9. Kanervisto, A., Vestman, V., Hautamäki, V., Kinnunen, T.: Effects of gender information in text-independent and text-dependent speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA (2017)

    Google Scholar 

  10. Chowdhury, L., Zunair, H., Mohammed, N.: Robust deep speaker recognition: learning latent representation with joint angular margin loss. Appl. Sci. 10(21), 1–17 (2020)

    Article  Google Scholar 

  11. Paulose, S., Mathew, D., Thomas, A.: Performance evaluation of different modeling methods and classifiers with MFCC and IHC features for speaker recognition. Procedia Comput. Sci. 115, 55–62 (2017)

    Article  Google Scholar 

  12. Ayadi, M., Hassan, A., Abdelnaby, A., Elgendy, O.: Text-independent speaker identification using robust statistics estimation. Speech Commun. 92, 52–63 (2017)

    Article  Google Scholar 

  13. India, M., Safari, P., Hernando, J.: Self multi-head attention for speaker recognition. In: INTERSPEECH (2019)

    Google Scholar 

  14. Torfi, A., Dawson, J., Nasrabadi, N.:Text independent speaker verification using 3D convolutional neural network. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), San Diego, CA, USA (2018)

    Google Scholar 

  15. Shon, S., Tang, H., Glass, J.: Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model. In: 2018 IEEE Spoken Language Technology Workshop (SLT), Greece, Athens (2019)

    Google Scholar 

  16. Emre, S., Soufleris, P., Duan, Z., Heinzelman, W.: A deep neural network model for speaker identification. Appl. Sci. 11(8), 1–18 (2021)

    Google Scholar 

  17. Ye, F., Yang, J.: Front-end speech enhancement for commercial speaker verification systems. Speech Commun. 99, 101–113 (2018)

    Article  Google Scholar 

  18. Liu, C., Yin, Y., Sun, Y., Ersoy, O.: Multi-scale ResNet and BiGRU automatic sleep staging based on attention mechanism. PloS One 17, 1–20 (2022)

    Google Scholar 

  19. Kumar, T., Bhukya, R.: Mel spectrogram based automatic speaker verification using GMM-UBM. In: 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Prayagraj, India (2022)

    Google Scholar 

  20. Wang, J.C., Wang, C.Y., Chin, Y.H., Liu, Y.T., Chen, E.T., Chang, P.C: Spectral-temporal receptive fields and MFCC balanced feature extraction for robust speaker recognition. Multimedia Tools Appl. 76, 4055–4068 (2017)

    Google Scholar 

  21. Gurbuz, S., Gowdy J., Tufekci, Z.: Speech spectrogram based model adaptation for speaker identification. In: Proceedings of the IEEE SoutheastCon 2000 ‘Preparing for The New Millennium’, Nashville, TN, USA (2002)

    Google Scholar 

  22. Alam, J., Kinnunen, T., Kenny, P., Ouellet, P., O’Shaughnessy, D.: Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Commun. 55(2), 237–251 (2013)

    Article  Google Scholar 

  23. Hossan, A., Memon, S., Gregory, M.: A novel approach for MFCC feature extraction. In: 2010 4th International Conference on Signal Processing and Communication Systems, Gold Coast, QLD, Australia (2010)

    Google Scholar 

  24. Weng, Z., Li, L., Guo, D.: Speaker recognition using weighted dynamic MFCC based on GMM. In: 2010 International Conference on Anti-Counterfeiting, Security and Identification, Chengdu, China (2010)

    Google Scholar 

  25. Abdul, R., Setianingsih, C., Nasrun, M.: Speaker recognition for device controlling using MFCC and GMM algorithm. In: 2020 2nd International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), Kuala Lumpur, Malaysia (2021)

    Google Scholar 

  26. Ajgou, R., Sbaa, S., Ghendir, S., Chamsa, A., Taleb A.: Robust remote speaker recognition system based on AR-MFCC features and efficient speech activity detection algorithm. In: 2014 11th International Symposium on Wireless Communications Systems (ISWCS), Barcelona, Spain (2014)

    Google Scholar 

  27. Sharma, D., Ali, I.: A modified MFCC feature extraction technique For robust speaker recognition. In: 2015 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Kochi, India (2015)

    Google Scholar 

  28. Zhao, X., Wang, D.: Analyzing noise robustness of MFCC and GFCC features in speaker identification. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada (2013)

    Google Scholar 

  29. Li, Q., Huang, Y.: An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans. Audio Speech Lang. Process. 19(6), 1791–1801 (2011)

    Article  Google Scholar 

  30. Valero, X., Alias, F.: Gammatone cepstral coefficients: biologically inspired features for non-speech audio classification. IEEE Trans. Multimedia 14(6), 1684–1689 (2012)

    Article  Google Scholar 

  31. Ayoub, B., Jamal, K., Arsalane, Z.: Gammatone frequency Cepstral coefficients for speaker identification over VoIP networks. In: 2016 International Conference on Information Technology for Organizations Development (IT4OD), Fez, Morocco (2016)

    Google Scholar 

  32. Wang, H., Zhang, C.: The application of Gammatone frequency cepstral coefficients for forensic voice comparison under noisy conditions. Aust. J. Forensic Sci. 52(5), 553–568 (2020)

    Article  Google Scholar 

  33. Choudhary, H., Sadhya, D., Vinal, P.: Automatic speaker verification using gammatone frequency cepstral coefficients. In: 2021 8th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India (2021)

    Google Scholar 

  34. Farsiani, S., Izadkhah, H., Lotfi, S.: An optimum end-to-end text-independent speaker identification system using convolutional neural network. Comput. Electr. Eng. 100, 107882 (2022)

    Article  Google Scholar 

  35. Ashar, A., Shahid, M., Mushtaq, U.: Speaker identification using a hybrid CNN-MFCC approach. In: 2020 International Conference on Emerging Trends in Smart Technologies (ICETST), Karachi, Pakistan (2020)

    Google Scholar 

  36. Dwijayanti, S., Yunita, A., Yudho, B.: Speaker identification using a convolutional neural network (2022)

    Google Scholar 

  37. Soleymani, S., Dabouei, A., Mehdi, S., Kazemi, H., Dawson, J.: Prosodic-enhanced Siamese convolutional neural networks for cross-device text-independent speaker verification. In: 2018 IEEE 9th International Conference on Biometrics Theory, Applications and Systems (BTAS), Redondo Beach, CA, USA (2019)

    Google Scholar 

  38. Salvati, D., Drioli, C., Luca, G.: A late fusion deep neural network for robust speaker identification using raw waveforms and gammatone cepstral coefficients. Expert Syst. Appl. 222, 119750 (2023)

    Article  Google Scholar 

  39. Costantini, G., Cesarini, V., Brenna, E.: High-level CNN and machine learning methods for speaker recognition. Sensors 23(7), 3461 (2023)

    Article  Google Scholar 

  40. Bunrit, S., Inkian, T., Kerdprasop, N., Kerdprasop, K.: Text independent speaker identification using deep learning model of convolutional neural network. Int. J. Mach. Learn. Comput. 9(2), 143–148 (2019)

    Article  Google Scholar 

  41. Wondimu, L., Ramasamy, S., Worku, J.: Analyzing noise robustness of Cochleogram and Mel spectrogram features in deep learning based speaker recognition. Appl. Sci. 13, 1–16 (2022)

    Google Scholar 

  42. Zhao, Z., et al.: A lighten CNN-LSTM model for speaker verification on embedded devices. Futur. Gener. Comput. Syst. 100, 751–758 (2019)

    Article  Google Scholar 

  43. Bader, M., Shahin, I., Ahmed, A., Werghi, N.: Hybrid CNN-LSTM speaker identification framework for evaluating the impact of face masks. In: 2022 International Conference on Electrical and Computing Technologies and Applications (ICECTA), Ras Al Khaimah, United Arab Emirates (2022)

    Google Scholar 

  44. Shekhar, H., Roy, P.: A CNN-BiLSTM based hybrid model for Indian language identification. Appl. Acoust. 182, 108274 (2021)

    Article  Google Scholar 

  45. Liu, Y.-H., Liu, X., Fan, W., Zhong, B., Du, J.-X.: Efficient audio-visual speaker recognition via deep heterogeneous feature fusion. In: Zhou, J., et al. (eds.) CCBR 2017. LNCS, vol. 10568, pp. 575–583. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-69923-3_62

    Chapter  Google Scholar 

  46. Nagrani, A., Chung, J.S., Zisserman, A.: VoxCeleb: a large-scale speaker identification dataset. In: INTERSPEECH (2018)

    Google Scholar 

  47. Kim, S.H., Park, Y.H.: Adaptive convolutional neural network for text-independent speaker recognition. In: INTERSPEECH (2021)

    Google Scholar 

  48. Ding, S., Chen, T., Gong, X., Zha, W., Wang, Z.: AutoSpeech: neural architecture search for speaker recognition. arXiv:2005.03215v2 [eess.AS], vol. 31 (2020)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wondimu Lambamo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lambamo, W., Srinivasagan, R., Jifara, W. (2024). Speaker Identification Under Noisy Conditions Using Hybrid Deep Learning Model. In: Debelee, T.G., Ibenthal, A., Schwenker, F., Megersa Ayano, Y. (eds) Pan-African Conference on Artificial Intelligence. PanAfriConAI 2023. Communications in Computer and Information Science, vol 2068. Springer, Cham. https://doi.org/10.1007/978-3-031-57624-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-57624-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-57623-2

  • Online ISBN: 978-3-031-57624-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics