Text-dependent and text-independent speaker recognition of reverberant speech based on CNN

El-Moneim, Samia Abd; Sedik, Ahmed; Nassar, M. A.; El-Fishawy, Adel S.; Sharshar, A. M.; Hassan, Shaimaa E. A.; Mahmoud, Adel Zaghloul; Dessouky, Moawd I.; El-Banby, Ghada M.; El-Samie, Fathi E. Abd; El-Rabaie, El-Sayed M.; Neyazi, Badawi; Seddeq, H. S.; Ismail, Nabil A.; Khalaf, Ashraf A. M.; Elabyad, G. S. M.

doi:10.1007/s10772-021-09805-3

Text-dependent and text-independent speaker recognition of reverberant speech based on CNN

Published: 08 June 2021

Volume 24, pages 993–1006, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Samia Abd El-Moneim¹,
Ahmed Sedik²,
M. A. Nassar³,
Adel S. El-Fishawy³,
A. M. Sharshar³,
Shaimaa E. A. Hassan³,
Adel Zaghloul Mahmoud⁴,
Moawd I. Dessouky³,
Ghada M. El-Banby⁵,
Fathi E. Abd El-Samie^3,6,
El-Sayed M. El-Rabaie³,
Badawi Neyazi⁷,
H. S. Seddeq⁸,
Nabil A. Ismail⁹,
Ashraf A. M. Khalaf¹⁰ &
…
G. S. M. Elabyad³

448 Accesses
5 Citations
Explore all metrics

Abstract

Speaker recognition is one of several biometric recognition systems owing to its high importance in numerous applications of security and telecommunications. The key aspiration of speaker recognition systems is to know who is speaking depending on voice characteristics. This paper presents an extensive study of speaker recognition in both text-dependent and text-independent cases. Convolutional Neural Network (CNN) based feature extraction is extended to the text-dependent and text-independent speaker recognition tasks. In addition, the effect of reverberation on the speaker recognition system is addressed. All speech signals are converted into images by obtaining their spectrograms. Two proposed CNN models are presented for efficient speaker recognition from clean and reverberant speech signals. They depend on image processing concepts applied on spectrograms of speech signals. One of the proposed models is compared with a conventional Benchmark model in the text-independent scenario. The performance of the recognition system is measured by the recognition rate in the cases of clean and reverberant speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Fig. 8

Text-Independent Speaker Recognition Using Deep Learning

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Article 02 November 2023

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

Article 21 June 2022

References

Abd El-Samie, F. E. (2011). Information Security for Automatic Speaker Identification.” Springer briefs in electrical and computer engineering. Berlin: Springer.
Book Google Scholar
Barbu, T. (2007). A supervised text-independent speaker recognition approach. International Journal of Electronics and Communication Engineering, 1, 2726–2730.
Google Scholar
Hioka, Y., Tang, J. W., & Wan, J. (2016). Effect of adding artificial reverberation to speech-like masking sound. Applied Acoustics, 114, 171–178.
Article Google Scholar
Hiremani, V. A. (2015). Speaker recognition: A survey. International Journal of Emerging Technology and Advanced Engineering, 5(7), 325–335.
Google Scholar
KINGMA, Diederik P., & Jimmy, B. A. (2014). Adam: A method for stochastic Ooptimization. arXiv preprint arXiv:1412.6980.‏
LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.
Article Google Scholar
Lukic, Y., Vogt, C., Durr, O., & Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In IEEE international workshop on machine learning for signal processing (pp. 13–16).
Lukic, Y., Vogt, C. Durr, O., Stadelmann, T. (2016). Speaker identification and clustering using convolutional neural networks. In IEEE international workshop on machine learning for signal processing, Sept. 13–16, 2016.
Magic Data Technology Co., Ltd. Retrieved May 2019 from http://www.imagicdatatech.com/index.php/home/dataopensource/data_info/id/101.
Muda, L., Begam, M., & Elamvazuthi, I. (2010). Voice Recognition Algorithms using Mel-Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) techniques. Journal of Computing, 2, 2151–9617.
Google Scholar
Nayana P. K. et al. (2017). Comparison of text independent speaker identification systems using GMM and i-vector methods. In 7th international conference on advances in computing & communications, ICACC-2017 India (pp.47–54), August 2017.
Neammalai, P., Phimoltares, S, & Lursinsap, C. (2015). Speech and music classification using hybrid form of spectrogram and Fourier transformation. In IEEE international conference, Siem Reap, Cambodia, accepted 16 February 2015
Nishanth, K., & Karthik, G. (2015). Identification of diabetic maculopathy stages using fundus images. Journal of Molecular Image and Dynamics, 33, 319–119.
Google Scholar
Oppenheim, A. V. (1970). Speech spectrograms using the fast Fourier transform. In IEEE spectrum, international conference, September 1970.
Palaz, D., Magimai-Doss, M., & Collobert, R. (2015). Analysis of CNN-based speech recognition system using raw speech as Iinput, Interspeech (pp. 11–15).
Parada, P. P., Sharma, D., Naylor, P. A., & Waterschoot, T. V. (2014). Reverberant speech recognition: A phoneme analysis. In Proceedings on 2014 IEEE global conference signal information process (pp. 567–571).
Ramgire, J. B., & Jagdale, S. M. (2016). A survey on speaker recognition with various feature extraction and classification techniques. International Research Journal of Engineering and Technology, 03(04), 709–712.
Google Scholar
Ranzato, M. A., Huang, F. J., Boureau, Y. L., & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In Computer vision and pattern recognition, 2007. CVPR'07. IEEE conference (pp. 1–8).
Saquib, Z., Salam, N., Nair, R. P., Pandey, N., & Joshi, A. (2010). A survey on automatic speaker recognition systems. Communications in Computer and Information Science, 123, 134–145.
Article Google Scholar
Su, H. (2018). Combining speech and speaker recognition: A joint modeling approach. Electrical Engineering and Computer Sciences, 10 August 2018.
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., & Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90(250–271), 2017.
Google Scholar
Togneri, R., & Pullella, D. (2011). An overview of speaker identification: Accuracy and robustness issues. IEEE Circuits and Systems Magazine, 11, 23–61.
Article Google Scholar
Unoki, M., & Hiramatsu, S. (2008). MTF-based method of blind estimation of reverberation in room acoustics. In: 16th European signal processing conference (EUSIPCO 2008), August 2008.
Wang, Y. (2012). Robust text-independent speaker identification in a time-varying noisy environment. Journal of Software, 7(9), 1975–1980.
Google Scholar
Yegnanarayana, B., & Murthy, P. S. (2000). Enhancement of reverberant speech using LP residual signal. IEEE Transactions on Speech Audio Processing, 8, 267–281.
Article Google Scholar
Zhang, C., Yu, C., & Hansen, J. H. L. (2016). An investigation of deep learning frameworks for speaker verification anti-spoofing. IEEE Journal of Selected Topics in Signal Processing, 99(1–11), 2016.
Google Scholar

Download references

Author information

Authors and Affiliations

Communications and Electronics Department, Tanta High Institute of Engineering and Technology, Tanta, Egypt
Samia Abd El-Moneim
Department of Robotics and Intelligent Machines, Faculty of Artificial Intelligents, Kafrelsheikh University, Kafr Al Sheikh, Egypt
Ahmed Sedik
Department of Electronics and Electrical Communications and Electronics, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
M. A. Nassar, Adel S. El-Fishawy, A. M. Sharshar, Shaimaa E. A. Hassan, Moawd I. Dessouky, Fathi E. Abd El-Samie, El-Sayed M. El-Rabaie & G. S. M. Elabyad
Electronics and Communications Department, Faculty of Engineering, Zagazig University, Zagazig, Egypt
Adel Zaghloul Mahmoud
Automatic Control Department, Faculty of Electronic Engineering, Menoufia University, Menouf, Egypt
Ghada M. El-Banby
Department of Information Technology, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
Fathi E. Abd El-Samie
Productivity and Vocational Training Department, Ministry of Industry, Cairo, Egypt
Badawi Neyazi
Acoustic Laboratory, Housing and Building National Research Center, Giza, Egypt
H. S. Seddeq
Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
Nabil A. Ismail
Electrical Engineering Department, Faculty of Engineering, Minia University, Minia, Egypt
Ashraf A. M. Khalaf

Authors

Samia Abd El-Moneim
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed Sedik
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Nassar
View author publications
You can also search for this author in PubMed Google Scholar
Adel S. El-Fishawy
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Sharshar
View author publications
You can also search for this author in PubMed Google Scholar
Shaimaa E. A. Hassan
View author publications
You can also search for this author in PubMed Google Scholar
Adel Zaghloul Mahmoud
View author publications
You can also search for this author in PubMed Google Scholar
Moawd I. Dessouky
View author publications
You can also search for this author in PubMed Google Scholar
Ghada M. El-Banby
View author publications
You can also search for this author in PubMed Google Scholar
Fathi E. Abd El-Samie
View author publications
You can also search for this author in PubMed Google Scholar
El-Sayed M. El-Rabaie
View author publications
You can also search for this author in PubMed Google Scholar
Badawi Neyazi
View author publications
You can also search for this author in PubMed Google Scholar
H. S. Seddeq
View author publications
You can also search for this author in PubMed Google Scholar
Nabil A. Ismail
View author publications
You can also search for this author in PubMed Google Scholar
Ashraf A. M. Khalaf
View author publications
You can also search for this author in PubMed Google Scholar
G. S. M. Elabyad
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fathi E. Abd El-Samie.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

El-Moneim, S.A., Sedik, A., Nassar, M.A. et al. Text-dependent and text-independent speaker recognition of reverberant speech based on CNN. Int J Speech Technol 24, 993–1006 (2021). https://doi.org/10.1007/s10772-021-09805-3

Download citation

Received: 02 November 2019
Accepted: 25 December 2020
Published: 08 June 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10772-021-09805-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text-dependent and text-independent speaker recognition of reverberant speech based on CNN

Abstract

Access this article

Similar content being viewed by others

Text-Independent Speaker Recognition Using Deep Learning

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text-dependent and text-independent speaker recognition of reverberant speech based on CNN

Abstract

Access this article

Similar content being viewed by others

Text-Independent Speaker Recognition Using Deep Learning

A text independent speaker identification system using ANN, RNN, and CNN classification technique

Performance enhancement of text-independent speaker recognition in noisy and reverberation conditions using Radon transform with deep learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation