Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Nasr, Marwa A.; Abd-Elnaby, Mohammed; El-Fishawy, Adel S.; El-Rabaie, S.; Abd El-Samie, Fathi E.

doi:10.1007/s10772-018-9524-7

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Published: 17 September 2018

Volume 21, pages 941–951, (2018)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Marwa A. Nasr¹,
Mohammed Abd-Elnaby¹,
Adel S. El-Fishawy¹,
S. El-Rabaie¹ &
…
Fathi E. Abd El-Samie¹

468 Accesses
20 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey on automatic speech recognition using neural networks

Article 15 August 2023

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Article Open access 03 January 2024

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

References

Abd El-Moneim, S., Dessouky, M., Abd El-Samie, F. E., Nassar, M. A., & Abd El-Naby, M. (2015). Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification. International Journal of Speech Technology, 3, 555–564.
Article Google Scholar
Abd El-Samie, F. E., Shafik, A., El-sayed, H. S., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., et al. (2015). Sensitivity of automatic speaker identification to SVD digital audio watermarking, International Journal of Speech Technology, 18, 565–581.
Article Google Scholar
Dave, N. (2013). Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (IJARET), 1, 1–5.
Google Scholar
Dreyfus, G. (2005). Neural networks methodology and applications. Berlin: Springer.
MATH Google Scholar
Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, And Signal Processing, 29(2), 254–272.
Article Google Scholar
Galushkin, A. I. (2007). Neural networks theory. Berlin: Springer.
MATH Google Scholar
Gandhiraj, R., & Sathidevi, P. S. (2007). Auditory-based wavelet packet filter bank for speech recognition using neural network. In Proceedings of the 15th international conference on advanced computing and communications, Guwahati pp. 666–671.
Hayati, M., & Shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam region. In Proceeding of World Academy of Science, Engineering and Technology, Turkey (Vol. 22). ISSN 1307-6884.
Islam, A. (2017). Modified mel-frequency cepstral coefficients (MMFCC) in robust text-dependent speaker identification. In International Conference on Advances in Electrical Engineering (ICAEE), Dhaka pp. 505–509.
Kopparapu, S. K., & Laxminarayana, M. (2010). Choice of mel-filter bank in computing MFCC of a resampled speech. In IEEE, International conference on information science, signal processing and their applications (ISSPA), Kuala Lumpur (pp. 121–124).
Kura, V. (2003). Novel pitch detection algorithm with application to speech coding.
Li, X., Xie, H., & Cheng, B. (2006). Noisy speech enhancement based on discrete sine transform. In Proceedings of IEEE international multi-symposiums on computer and computational sciences (IMSCCS), Hangzhou.
McLeod, P. (2008). Fast, accurate PD tools for music analysis. Ph.D. thesis, the University of Otago, Dunedin, New Zealand.
Nakagawa, S., Wang, L., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.
Article Google Scholar
Nasr, M. A., El-Rabaie, S., Abd El-Samie, F. E., El-Fishawy, A. S., & Abd-Elnaby, M. (2018). Efficient implementation of adaptive wiener filter for pitch detection from noisy speech signals. MJEER, 27, 109–126.
Google Scholar
Nazar, M. N. (2002). Speaker identification using cepstral analysis. In Proceedings of IEEE ISCON'02 Conference (Vol. 1, pp. 139–143).
Polur, P. D., & Miller, G. E. (2005). Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 558–561.
Article Google Scholar
Pullella, D. (2006). Speaker identification using higher order spectra. Dissertation of Bachelor of Electrical and Electronic Engineering, University of Western Australia.
Sahidullah, M., & Saha, G. (2013). A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Processing Letters, 20(2), 149–153.
Article Google Scholar
Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-samie, F. E. (2009a). A wavelet based approach for speaker identification from degraded speech. International Journal of Communication Networks and Information Security (IJCNIS), 1(3), 52–58.
Google Scholar
Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-Samie, F. E. (2009b). DCT assisted speaker identification in the presence of noise and channel degradation. In International conference on computer engineering & system (ICCES), Cairo (pp. 191–196).
Shuling, L., & Wang, C. (2009). Nonspecific speech recognition method based on composite LVQ1 and LVQ2 network. In Chinese control and decision conference (CCDC), Guilin (pp. 2304–2308)
Sinith, M. S., Salim, A., Sankar, K. G., Narayanan, K. V., & Soman, V. (2010). A novel method for text-independent speaker identification using MFCC and GMM. In IEEE international conference on audio, language and image processing (ICALIP), Shanghai (pp. 292–296).
Sukhostat, L., Imamverdiyev, Y., & Azerbaijan, B. (2014). A comparative analysis of PD methods under the influence of different noise conditions. Journal of Voice, 4, 1–8.
Google Scholar
Veena, K. V., & Mathew, D. (2015). Speaker identification and verification of noisy speech using multitaper MFCC and Gaussian mixture models. In IEEE international conference on power, instrumentation, control and computing (PICC), Thrissur (pp. 1–4).
Walker, J. S. (1999). A primer on wavelets and their scientific applications. CRC Press, Boca Raton.
Book MATH Google Scholar
Zulfiqar, A., Muhammad, A., & Martinez Enriquez, A. M. (2009). A speaker identification system using MFCC features with VQ technique. In IEEE third international symposium on intelligent information technology application, Nanchang (Vol. 9, pp. 115–118).

Download references

Author information

Authors and Affiliations

Department of Electronics and Electrical Communications, Faculty of Electronic Engineering, Menoufia University, Menouf, 32952, Egypt
Marwa A. Nasr, Mohammed Abd-Elnaby, Adel S. El-Fishawy, S. El-Rabaie & Fathi E. Abd El-Samie

Authors

Marwa A. Nasr
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Abd-Elnaby
View author publications
You can also search for this author in PubMed Google Scholar
Adel S. El-Fishawy
View author publications
You can also search for this author in PubMed Google Scholar
S. El-Rabaie
View author publications
You can also search for this author in PubMed Google Scholar
Fathi E. Abd El-Samie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marwa A. Nasr.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nasr, M.A., Abd-Elnaby, M., El-Fishawy, A.S. et al. Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients. Int J Speech Technol 21, 941–951 (2018). https://doi.org/10.1007/s10772-018-9524-7

Download citation

Received: 27 May 2017
Accepted: 09 June 2018
Published: 17 September 2018
Issue Date: 15 December 2018
DOI: https://doi.org/10.1007/s10772-018-9524-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey on automatic speech recognition using neural networks

Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques

Speech Emotion Recognition: A Comprehensive Survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation