Skip to main content
Log in

Text dependant speaker recognition using MFCC, LPC and DWT

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

The objective of this work is to investigate the benefit of discrete wavelet transform combined with LPC, for speaker identification system applied for Algerian Berber language, compared to the traditional Mel frequency analysis. We’ve developed a speaker identification system for Algerian Berber language. The corpus concerns two dataset, the first one concerns eight isolated words and the second is dedicated for continuous speech repeated by Algerian native Berber. We’ve used MFCC feature, their first and second derivatives and discrete wavelet transform (DWT) followed by linear predictive coding (LPC) to ameliorate the parameterization phase. Mahalanobis distance, ascendant classification and pitch analysis were used for characterizing our speech signals. We evaluate the performance of DWT–LPC feature for clean and additive noisy speech. The multilayer perceptron classifier was used for this purpose, efficiency was improved for DWT combined with LPC feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  • Abdalla, M. I., Abobakr, H. A., & Gaafar, T. S. (2013). DWT and MFCCs based feature extraction methods for isolated word recognition. International Journal of Computer Applications, 69(20). doi:10.5120/12087-8165. ISSN 0975–8887.

  • Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press. Retrieved from http://www.ee.bgu.ac.il/html/files/136152540.pdf.

  • Chakraborty, P., Ahmed, F., Kabir, M. M., Shahjahan, M., & Murase, K. (2008). An automatic speaker recognition system. In M. Ishikawa et al. (Eds.), ICONIP 2007, Part I, LNCS 4984, Neural Information Processing. Springer, Berlin. pp. 517–526.

  • Chaudhary, R. (2013). Short-term spectral feature extraction and their fusion in text independent speaker recognition: A review. International Journal of Information Technology, BIJIT, 5(2), 630–639. ISSN 0973–5658.

    MathSciNet  Google Scholar 

  • Chelali, F. Z. (2017). Berber dataset. http://www.fatmazohrachelali.com.

  • Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.

  • de Lara, J. R. C. (2005). A method of automatic speaker recognition using cepstral features and vectorial quantization, CIARP, LNCS 3773, pp. 146–153.

  • Durak, B. (2011) A classification algorithm using Mahalanobis distance clustering of data with applications on biomedical data sets, a thesis submitted to the graduate school of natural and Applied Science of Middle East Technical University.

  • Furui, S. (1981). Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Transactions on Acoustics Speech and Signal Processing, 29(3), 342–350.

    Article  Google Scholar 

  • Hirst, D., & Di Cristo, A. (2000). Intonation systems. A survey of twenty languages (Vol. 76, no. 2, pp. 460–463). Cambridge: Cambridge University Press. Linguistic Society of America. doi:10.2307/417674.

  • Holmes, J., & Holmes, W. (2003). Introduction to Front-end Analysis for Automatic Speech Recognition 0.2ème edition, Speech Synthesis and Recognition. Chapter 10. Taylor and Francis e-Library.

  • Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the 4th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE. doi:10.1109/ICSPCS.2010.5709752.

  • Huang, C., Chen, G., Yu, H., Bao, Y., & Zhao, L. (2013). Speech emotion recognition under white noise. Archives of Acoustics, 38(4), 457–463.

    Article  Google Scholar 

  • Jamaati, M., Marvi, H., & Lankarany, M. (2008). Vowels recognition using mellin transform and PLP-based feature extraction. Journal of the Acoustical Society of America, 123(5), 3177.

    Article  Google Scholar 

  • Josse, V. (2003) Identification nommée du locuteur: Exploitation conjointe du signal sonore et de sa transcription. Thèse de doctorat, Ecole doctorale, Académie de Nantes. Université du Maine. France.

  • Lei, H. H. (2010). Structured approaches to data selection for speaker recognition, Technical report. UCB/EECS-2010-150. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-150.pdf. Accessed on 14 Jul 2017.

  • Lung, S. Y. (2010). Improved wavelet feature extraction using kernel analysis for text independent speaker recognition. Digital Signal Processing, 20, 1400–1407. doi:10.1016/j.dsp.2009.12.004.

    Article  Google Scholar 

  • Mahalanobis, P. C. (1936) On the generalised distance in statistics. In Proceedings of the National Institute of Sciences of India, 2(1), pp. 49–55.

    MathSciNet  MATH  Google Scholar 

  • McLachlan, G. J. (1999). Mahalanobis distance. Resonance, 4(6), 20–26.

    Article  Google Scholar 

  • Minh, N. D. (1996) An automatic speaker recognition system. white paper, Digital signal processing, Mini-project, Swiss federal Institute of Technology, Lausanne, Switzerland, pp. 1–14. Retrieved from http://www.codelooker.com/dfilec/7880ljsafasdf/asr_project.pdf.

  • Nehe, N. S., & Holambe, R. S. (2012). DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing. http://asmp.eurasipjournals.com/content/2012/1/7.

  • Pandiaraj, S., & Shankar Kumar, K. R. (2015). Speaker identification using discrete wavelet transform. Journal of Computer Science, 11(1), 53–56. doi:10.3844/jcssp.2015.53.56.

    Article  Google Scholar 

  • Parizeau, M. (2004). le perceptron multicouche et son algorithme de rétropropagation des erreurs, département de génie électrique et de génie informatique, Université de laval, 10 septembre. http://reussirlem1info.files.wordpress.com/2012/05/mlp.pdf.

  • Rishiraj, M. (2012) Speaker recognition using shifted MFCC, Graduate theses and dissertations. University of South Florida. http://scholarcommons.usf.edu/etd/4136/.

  • Sabitha, V, & Janardhanan, P. Speaker verification system using MFCC and DWT. IOSR Journal of Electronics & Communication Engineering (IOSR-JECE), pp. 24–29. ISSN (e): 2278–1684 ISSN(p): 2320-334X.

  • Saeed, K., & Kheir Nammous, M. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image. IEEE Transactions on Industrial Electronics, 54(2), 887–897.

    Article  Google Scholar 

  • Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.

  • Senthil Raja, G., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13(3), 141–161. doi:10.1007/s10772-010-9075-z.

    Article  Google Scholar 

  • Srinivas, V., Santhi rani, Ch., & Madhu, T. (2014). Neural network based classification for speaker identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(1), 109–120.

  • Tanprasert, C., Wutiwiwatchai, C., & Sae-tang, S. (2000). Text-dependent speaker identification using neural network on distinctive Thai tone marks. Technical Journal, 1(6), 249–253.

    Google Scholar 

  • Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition (2nd ed.). London: Academic Press. eBook ISBN: 9780080949123.

    MATH  Google Scholar 

  • Toutios, A., & Margaritis, K. G. (2002). Development of a text-dependent speaker identification system with the OGI Toolkit. In second hellenic conference on Al, SETN-2002, Thessaloniki, Greece, Proceeding, Companion Volume, pp. 525–530.

  • Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. IEEE Transactions on Audio, Speech, and Language Processing, 22(4), 836–845.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatma Zohra Chelali.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chelali, F.Z., Djeradi, A. Text dependant speaker recognition using MFCC, LPC and DWT. Int J Speech Technol 20, 725–740 (2017). https://doi.org/10.1007/s10772-017-9441-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-017-9441-1

Keywords

Navigation