Skip to main content
Log in

Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents an efficient approach for automatic speaker identification based on cepstral features and the Normalized Pitch Frequency (NPF). Most relevant speaker identification methods adopt a cepstral strategy. Inclusion of the pitch frequency as a new feature in the speaker identification process is expected to enhance the speaker identification accuracy. In the proposed framework for speaker identification, a neural classifier with a single hidden layer is used. Different transform domains are investigated for reliable feature extraction from the speech signal. Moreover, a pre-processing noise reduction step, is used prior to the feature extraction process to enhance the performance of the speaker identification system. Simulation results prove that the NPF as a feature in speaker identification enhances the performance of the speaker identification system, especially with the Discrete Cosine Transform (DCT) and wavelet denoising pre-processing step.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Abd El-Moneim, S., Dessouky, M., Abd El-Samie, F. E., Nassar, M. A., & Abd El-Naby, M. (2015). Hybrid speech enhancement with empirical mode decomposition and spectral subtraction for efficient speaker identification. International Journal of Speech Technology, 3, 555–564.

    Article  Google Scholar 

  • Abd El-Samie, F. E., Shafik, A., El-sayed, H. S., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., et al. (2015). Sensitivity of automatic speaker identification to SVD digital audio watermarking, International Journal of Speech Technology, 18, 565–581.

    Article  Google Scholar 

  • Dave, N. (2013). Feature extraction methods LPC, PLP and MFCC in speech recognition. International Journal for Advance Research in Engineering and Technology (IJARET), 1, 1–5.

    Google Scholar 

  • Dreyfus, G. (2005). Neural networks methodology and applications. Berlin: Springer.

    MATH  Google Scholar 

  • Furui, S. (1981). Cepstral analysis technique for automatic speaker verification. IEEE Transactions on Acoustics, Speech, And Signal Processing, 29(2), 254–272.

    Article  Google Scholar 

  • Galushkin, A. I. (2007). Neural networks theory. Berlin: Springer.

    MATH  Google Scholar 

  • Gandhiraj, R., & Sathidevi, P. S. (2007). Auditory-based wavelet packet filter bank for speech recognition using neural network. In Proceedings of the 15th international conference on advanced computing and communications, Guwahati pp. 666–671.

  • Hayati, M., & Shirvany, Y. (2007). Artificial neural network approach for short term load forecasting for Illam region. In Proceeding of World Academy of Science, Engineering and Technology, Turkey (Vol. 22). ISSN 1307-6884.

  • Islam, A. (2017). Modified mel-frequency cepstral coefficients (MMFCC) in robust text-dependent speaker identification. In International Conference on Advances in Electrical Engineering (ICAEE), Dhaka pp. 505–509.

  • Kopparapu, S. K., & Laxminarayana, M. (2010). Choice of mel-filter bank in computing MFCC of a resampled speech. In IEEE, International conference on information science, signal processing and their applications (ISSPA), Kuala Lumpur (pp. 121–124).

  • Kura, V. (2003). Novel pitch detection algorithm with application to speech coding.

  • Li, X., Xie, H., & Cheng, B. (2006). Noisy speech enhancement based on discrete sine transform. In Proceedings of IEEE international multi-symposiums on computer and computational sciences (IMSCCS), Hangzhou.

  • McLeod, P. (2008). Fast, accurate PD tools for music analysis. Ph.D. thesis, the University of Otago, Dunedin, New Zealand.

  • Nakagawa, S., Wang, L., & Ohtsuka, S. (2012). Speaker identification and verification by combining MFCC and phase information. IEEE Transactions on Audio, Speech, and Language Processing, 20(4), 1085–1095.

    Article  Google Scholar 

  • Nasr, M. A., El-Rabaie, S., Abd El-Samie, F. E., El-Fishawy, A. S., & Abd-Elnaby, M. (2018). Efficient implementation of adaptive wiener filter for pitch detection from noisy speech signals. MJEER, 27, 109–126.

    Google Scholar 

  • Nazar, M. N. (2002). Speaker identification using cepstral analysis. In Proceedings of IEEE ISCON'02 Conference (Vol. 1, pp. 139–143).

  • Polur, P. D., & Miller, G. E. (2005). Experiments with fast Fourier transform, linear predictive and cepstral coefficients in dysarthric speech recognition algorithms using Hidden Markov Model. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 13(4), 558–561.

    Article  Google Scholar 

  • Pullella, D. (2006). Speaker identification using higher order spectra. Dissertation of Bachelor of Electrical and Electronic Engineering, University of Western Australia.

  • Sahidullah, M., & Saha, G. (2013). A novel windowing technique for efficient computation of MFCC for speaker recognition. IEEE Signal Processing Letters, 20(2), 149–153.

    Article  Google Scholar 

  • Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-samie, F. E. (2009a). A wavelet based approach for speaker identification from degraded speech. International Journal of Communication Networks and Information Security (IJCNIS), 1(3), 52–58.

    Google Scholar 

  • Shafik, A., Elhalafawy, S. M., Diab, S. M., Sallam, B. M., & Abd El-Samie, F. E. (2009b). DCT assisted speaker identification in the presence of noise and channel degradation. In International conference on computer engineering & system (ICCES), Cairo (pp. 191–196).

  • Shuling, L., & Wang, C. (2009). Nonspecific speech recognition method based on composite LVQ1 and LVQ2 network. In Chinese control and decision conference (CCDC), Guilin (pp. 2304–2308)

  • Sinith, M. S., Salim, A., Sankar, K. G., Narayanan, K. V., & Soman, V. (2010). A novel method for text-independent speaker identification using MFCC and GMM. In IEEE international conference on audio, language and image processing (ICALIP), Shanghai (pp. 292–296).

  • Sukhostat, L., Imamverdiyev, Y., & Azerbaijan, B. (2014). A comparative analysis of PD methods under the influence of different noise conditions. Journal of Voice, 4, 1–8.

    Google Scholar 

  • Veena, K. V., & Mathew, D. (2015). Speaker identification and verification of noisy speech using multitaper MFCC and Gaussian mixture models. In IEEE international conference on power, instrumentation, control and computing (PICC), Thrissur (pp. 1–4).

  • Walker, J. S. (1999). A primer on wavelets and their scientific applications. CRC Press, Boca Raton.

    Book  MATH  Google Scholar 

  • Zulfiqar, A., Muhammad, A., & Martinez Enriquez, A. M. (2009). A speaker identification system using MFCC features with VQ technique. In IEEE third international symposium on intelligent information technology application, Nanchang (Vol. 9, pp. 115–118).

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marwa A. Nasr.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nasr, M.A., Abd-Elnaby, M., El-Fishawy, A.S. et al. Speaker identification based on normalized pitch frequency and Mel Frequency Cepstral Coefficients. Int J Speech Technol 21, 941–951 (2018). https://doi.org/10.1007/s10772-018-9524-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-018-9524-7

Keywords

Navigation