Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

El-Moneim, Samia Abd; El-Mordy, Eman Abd; Nassar, M. A.; Dessouky, Moawad I.; Ismail, Nabil A.; El-Fishawy, Adel S.; El-Dolil, Sami; El-Dokany, Ibrahim M.; El-Samie, Fathi E. Abd

doi:10.1007/s10772-021-09880-6

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Published: 21 June 2022

Volume 25, pages 679–687, (2022)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Samia Abd El-Moneim^1,2,
Eman Abd El-Mordy¹,
M. A. Nassar¹,
Moawad I. Dessouky¹,
Nabil A. Ismail³,
Adel S. El-Fishawy¹,
Sami El-Dolil¹,
Ibrahim M. El-Dokany¹ &
…
Fathi E. Abd El-Samie^1,4

140 Accesses
Explore all metrics

Abstract

Automatic Speaker Recognition (ASR) in mismatched conditions is a challenging task, since robust feature extraction and classification techniques are required. Long Short-Term Memory Recurrent Neural Network (LSTM-RNN) is an efficient network that can learn to recognize speakers, text-independently, when the recording circumstances are similar. Unfortunately, when the recording circumstances differ, its performance degrades. In this paper, Radon projection of the spectrograms of speech signals is implemented to get the features, since Radon Transform (RT) has less sensitivity to noise and reverberation conditions. The Radon projection is implemented on the spectrograms of speech signals, and then 2-D Discrete Cosine Transform (DCT) is computed. This technique improves the system recognition accuracy, text-independently with less sensitivity to noise and reverberation effects. The ASR system performance with the proposed features is compared to that of the system that depends on Mel Frequency Cepstral Coefficients (MFCCs) and spectrum features. For noisy utterances at 25 dB, the recognition rate with the proposed feature reaches 80%, while it is 27% and 28% with MFCCs and spectrum, respectively. For reverberant speech, the recognition rate reaches 80.67% with the proposed features, while it reaches 54% and 62.67% with the MFCCs and spectrum, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Fig. 6

Automatic speech recognition: a survey

Article 10 November 2020

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

References

Abd El-samie, F. E. (2011). Information security for automatic speaker identification. Springer briefs in electrical and computer engineering. New York: Springer, 2011.
Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2010). Action classification in soccer videos with long short-term memory recurrent neural networks (pp. 154–159). Berlin: Springer-Verlag.
Google Scholar
Campbell, J. P. (1997). Speaker recognition: A tutorial. In Proceedings of the IEEE, Vol. 85.
Das, A., Jena, M. R., & Barik, K. K. (2014). Mel-frequency cepstral coefficient (MFCC) a novel method for speaker recognition. Digital Technologies, 1(1), 1–3.
Google Scholar
Dennis, J., Dat, T., & Li, H. (2011). Spectrogram image feature for sound event classification in mismatched conditions. IEEE Signal Processing Letters, 18(2), 130–133.
Article Google Scholar
Harimi, A., Shahzadi, A., Ahmadyfard, A., & Yaghmaie, K. (2013). Speech emotion recognition using radon and discrete cosine transform based features from speech spectrogram. Journal of Intelligent Automation Systems. https://doi.org/10.22044/JIAS.2014.223
Joshi, D., Upadhayay, M. D., & Joshi, S. D. (2013). Robust language and speaker identification using image processing techniques combined with PCA. IEEE, pp. 213–218.
Kinoshita, K., et al. (2016). A summary of the reverb challenge: State-of-the-art and remaining challenges in reverberant speech processing. Journal on Advances in Signal Processing. https://doi.org/10.1186/s13634-016-0306-6
Article Google Scholar
Li, X., Wu, X. (2015). Modeling speaker variability using long short-term memory networks for speech recognition. In INTERSPEECH 2015, pp. 1086–1090, Sept 6–10.
Mohanan, N., Velmurugan, R., & Rao, P. (2018). A non-convolutive NMF model for speech dereverberation. In INTERSPEECH 2018, Indian Institute of Technology Bombay.
Parada, P. P., Sharma, D., Naylor, P. A., & van Waterschoot, T. (2014). Reverberant speech recognition: A phoneme analysis. In Proc. 2014 IEEE global conf. signal inf. process. (GlobalSIP '14), Atlanta, GA, USA, Dec. 2014, pp. 567–571.
Sharma, A., Singh, S. P., & Kumar, V. (2005). Text-independent speaker identification using back propagation MLP network classifier for a closed set of speaker. In: IEEE international symposium on signal processing and information technology. Allahabad: Indian Institute of Information Technology.
Sekar, K. (2012). “Performance analysis of text-independent speaker identification system”, International conference on modeling optimisation and computer. Procedia Engineering, 38, 1925–1934.
Article Google Scholar
Zazo, R., Diez, A. L., Dominguez, J. G., Toledano, D. T., & Rodriguez, J. G. (2016). Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE. https://doi.org/10.1371/journal.pone.0146917,Jan.29
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Communications Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
Samia Abd El-Moneim, Eman Abd El-Mordy, M. A. Nassar, Moawad I. Dessouky, Adel S. El-Fishawy, Sami El-Dolil, Ibrahim M. El-Dokany & Fathi E. Abd El-Samie
High Institute of Engineering and Technology, Tanta, Egypt
Samia Abd El-Moneim
Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf 32952, Egypt, Menouf, 32952, Egypt
Nabil A. Ismail
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
Fathi E. Abd El-Samie

Authors

Samia Abd El-Moneim
View author publications
You can also search for this author in PubMed Google Scholar
Eman Abd El-Mordy
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Nassar
View author publications
You can also search for this author in PubMed Google Scholar
Moawad I. Dessouky
View author publications
You can also search for this author in PubMed Google Scholar
Nabil A. Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Adel S. El-Fishawy
View author publications
You can also search for this author in PubMed Google Scholar
Sami El-Dolil
View author publications
You can also search for this author in PubMed Google Scholar
Ibrahim M. El-Dokany
View author publications
You can also search for this author in PubMed Google Scholar
Fathi E. Abd El-Samie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samia Abd El-Moneim.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Moneim, S.A., El-Mordy, E.A., Nassar, M.A. et al. Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning. Int J Speech Technol 25, 679–687 (2022). https://doi.org/10.1007/s10772-021-09880-6

Download citation

Received: 06 November 2019
Accepted: 02 January 2021
Published: 21 June 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s10772-021-09880-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A comprehensive survey on automatic speech recognition using neural networks

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation