Text dependant speaker recognition using MFCC, LPC and DWT

Chelali, Fatma Zohra; Djeradi, Amar

doi:10.1007/s10772-017-9441-1

Text dependant speaker recognition using MFCC, LPC and DWT

Published: 26 July 2017

Volume 20, pages 725–740, (2017)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Fatma Zohra Chelali¹ &
Amar Djeradi¹

678 Accesses
16 Citations
Explore all metrics

Abstract

The objective of this work is to investigate the benefit of discrete wavelet transform combined with LPC, for speaker identification system applied for Algerian Berber language, compared to the traditional Mel frequency analysis. We’ve developed a speaker identification system for Algerian Berber language. The corpus concerns two dataset, the first one concerns eight isolated words and the second is dedicated for continuous speech repeated by Algerian native Berber. We’ve used MFCC feature, their first and second derivatives and discrete wavelet transform (DWT) followed by linear predictive coding (LPC) to ameliorate the parameterization phase. Mahalanobis distance, ascendant classification and pitch analysis were used for characterizing our speech signals. We evaluate the performance of DWT–LPC feature for clean and additive noisy speech. The multilayer perceptron classifier was used for this purpose, efficiency was improved for DWT combined with LPC feature vectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abdalla, M. I., Abobakr, H. A., & Gaafar, T. S. (2013). DWT and MFCCs based feature extraction methods for isolated word recognition. International Journal of Computer Applications, 69(20). doi:10.5120/12087-8165. ISSN 0975–8887.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press. Retrieved from http://www.ee.bgu.ac.il/html/files/136152540.pdf.
Chakraborty, P., Ahmed, F., Kabir, M. M., Shahjahan, M., & Murase, K. (2008). An automatic speaker recognition system. In M. Ishikawa et al. (Eds.), ICONIP 2007, Part I, LNCS 4984, Neural Information Processing. Springer, Berlin. pp. 517–526.
Chaudhary, R. (2013). Short-term spectral feature extraction and their fusion in text independent speaker recognition: A review. International Journal of Information Technology, BIJIT, 5(2), 630–639. ISSN 0973–5658.
MathSciNet Google Scholar
Chelali, F. Z. (2017). Berber dataset. http://www.fatmazohrachelali.com.
Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.
de Lara, J. R. C. (2005). A method of automatic speaker recognition using cepstral features and vectorial quantization, CIARP, LNCS 3773, pp. 146–153.
Durak, B. (2011) A classification algorithm using Mahalanobis distance clustering of data with applications on biomedical data sets, a thesis submitted to the graduate school of natural and Applied Science of Middle East Technical University.
Furui, S. (1981). Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Transactions on Acoustics Speech and Signal Processing, 29(3), 342–350.
Article Google Scholar
Hirst, D., & Di Cristo, A. (2000). Intonation systems. A survey of twenty languages (Vol. 76, no. 2, pp. 460–463). Cambridge: Cambridge University Press. Linguistic Society of America. doi:10.2307/417674.
Holmes, J., & Holmes, W. (2003). Introduction to Front-end Analysis for Automatic Speech Recognition 0.2ème edition, Speech Synthesis and Recognition. Chapter 10. Taylor and Francis e-Library.
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the 4th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE. doi:10.1109/ICSPCS.2010.5709752.
Huang, C., Chen, G., Yu, H., Bao, Y., & Zhao, L. (2013). Speech emotion recognition under white noise. Archives of Acoustics, 38(4), 457–463.
Article Google Scholar
Jamaati, M., Marvi, H., & Lankarany, M. (2008). Vowels recognition using mellin transform and PLP-based feature extraction. Journal of the Acoustical Society of America, 123(5), 3177.
Article Google Scholar
Josse, V. (2003) Identification nommée du locuteur: Exploitation conjointe du signal sonore et de sa transcription. Thèse de doctorat, Ecole doctorale, Académie de Nantes. Université du Maine. France.
Lei, H. H. (2010). Structured approaches to data selection for speaker recognition, Technical report. UCB/EECS-2010-150. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-150.pdf. Accessed on 14 Jul 2017.
Lung, S. Y. (2010). Improved wavelet feature extraction using kernel analysis for text independent speaker recognition. Digital Signal Processing, 20, 1400–1407. doi:10.1016/j.dsp.2009.12.004.
Article Google Scholar
Mahalanobis, P. C. (1936) On the generalised distance in statistics. In Proceedings of the National Institute of Sciences of India, 2(1), pp. 49–55.
MathSciNet MATH Google Scholar
McLachlan, G. J. (1999). Mahalanobis distance. Resonance, 4(6), 20–26.
Article Google Scholar
Minh, N. D. (1996) An automatic speaker recognition system. white paper, Digital signal processing, Mini-project, Swiss federal Institute of Technology, Lausanne, Switzerland, pp. 1–14. Retrieved from http://www.codelooker.com/dfilec/7880ljsafasdf/asr_project.pdf.
Nehe, N. S., & Holambe, R. S. (2012). DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing. http://asmp.eurasipjournals.com/content/2012/1/7.
Pandiaraj, S., & Shankar Kumar, K. R. (2015). Speaker identification using discrete wavelet transform. Journal of Computer Science, 11(1), 53–56. doi:10.3844/jcssp.2015.53.56.
Article Google Scholar
Parizeau, M. (2004). le perceptron multicouche et son algorithme de rétropropagation des erreurs, département de génie électrique et de génie informatique, Université de laval, 10 septembre. http://reussirlem1info.files.wordpress.com/2012/05/mlp.pdf.
Rishiraj, M. (2012) Speaker recognition using shifted MFCC, Graduate theses and dissertations. University of South Florida. http://scholarcommons.usf.edu/etd/4136/.
Sabitha, V, & Janardhanan, P. Speaker verification system using MFCC and DWT. IOSR Journal of Electronics & Communication Engineering (IOSR-JECE), pp. 24–29. ISSN (e): 2278–1684 ISSN(p): 2320-334X.
Saeed, K., & Kheir Nammous, M. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image. IEEE Transactions on Industrial Electronics, 54(2), 887–897.
Article Google Scholar
Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.
Senthil Raja, G., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13(3), 141–161. doi:10.1007/s10772-010-9075-z.
Article Google Scholar
Srinivas, V., Santhi rani, Ch., & Madhu, T. (2014). Neural network based classification for speaker identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(1), 109–120.
Tanprasert, C., Wutiwiwatchai, C., & Sae-tang, S. (2000). Text-dependent speaker identification using neural network on distinctive Thai tone marks. Technical Journal, 1(6), 249–253.
Google Scholar
Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition (2nd ed.). London: Academic Press. eBook ISBN: 9780080949123.
MATH Google Scholar
Toutios, A., & Margaritis, K. G. (2002). Development of a text-dependent speaker identification system with the OGI Toolkit. In second hellenic conference on Al, SETN-2002, Thessaloniki, Greece, Proceeding, Companion Volume, pp. 525–530.
Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. IEEE Transactions on Audio, Speech, and Language Processing, 22(4), 836–845.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Speech Communication and Signal Processing Laboratory, Faculty of Electronics Engineering and Computer Science, University of Science and Technology Houari Boumedienne (USTHB), Box no: 32 El Alia, 16111, Algiers, Algeria
Fatma Zohra Chelali & Amar Djeradi

Authors

Fatma Zohra Chelali
View author publications
You can also search for this author in PubMed Google Scholar
Amar Djeradi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatma Zohra Chelali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chelali, F.Z., Djeradi, A. Text dependant speaker recognition using MFCC, LPC and DWT. Int J Speech Technol 20, 725–740 (2017). https://doi.org/10.1007/s10772-017-9441-1

Download citation

Received: 09 May 2017
Accepted: 17 July 2017
Published: 26 July 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10772-017-9441-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Text dependant speaker recognition using MFCC, LPC and DWT

Abstract

Access this article

Similar content being viewed by others

A Robust Wavelet Based Decomposition and Multilayer Neural Network for Speaker Identification

Wavelet Packet Based Mel Frequency Cepstral Features for Text Independent Speaker Identification

Text-independent speaker identification system using discrete wavelet transform with linear prediction coding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Text dependant speaker recognition using MFCC, LPC and DWT

Abstract

Access this article

Similar content being viewed by others

A Robust Wavelet Based Decomposition and Multilayer Neural Network for Speaker Identification

Wavelet Packet Based Mel Frequency Cepstral Features for Text Independent Speaker Identification

Text-independent speaker identification system using discrete wavelet transform with linear prediction coding

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation