Abstract
The objective of this work is to investigate the benefit of discrete wavelet transform combined with LPC, for speaker identification system applied for Algerian Berber language, compared to the traditional Mel frequency analysis. We’ve developed a speaker identification system for Algerian Berber language. The corpus concerns two dataset, the first one concerns eight isolated words and the second is dedicated for continuous speech repeated by Algerian native Berber. We’ve used MFCC feature, their first and second derivatives and discrete wavelet transform (DWT) followed by linear predictive coding (LPC) to ameliorate the parameterization phase. Mahalanobis distance, ascendant classification and pitch analysis were used for characterizing our speech signals. We evaluate the performance of DWT–LPC feature for clean and additive noisy speech. The multilayer perceptron classifier was used for this purpose, efficiency was improved for DWT combined with LPC feature vectors.
Similar content being viewed by others
References
Abdalla, M. I., Abobakr, H. A., & Gaafar, T. S. (2013). DWT and MFCCs based feature extraction methods for isolated word recognition. International Journal of Computer Applications, 69(20). doi:10.5120/12087-8165. ISSN 0975–8887.
Bishop, C. M. (1995). Neural networks for pattern recognition. Oxford: Oxford University Press. Retrieved from http://www.ee.bgu.ac.il/html/files/136152540.pdf.
Chakraborty, P., Ahmed, F., Kabir, M. M., Shahjahan, M., & Murase, K. (2008). An automatic speaker recognition system. In M. Ishikawa et al. (Eds.), ICONIP 2007, Part I, LNCS 4984, Neural Information Processing. Springer, Berlin. pp. 517–526.
Chaudhary, R. (2013). Short-term spectral feature extraction and their fusion in text independent speaker recognition: A review. International Journal of Information Technology, BIJIT, 5(2), 630–639. ISSN 0973–5658.
Chelali, F. Z. (2017). Berber dataset. http://www.fatmazohrachelali.com.
Chen, K., Wang, L., & Chi, H. (1997). Methods of combining multiple classifiers with different features and their applications to text-independent speaker identification. International Journal of Pattern Recognition and Artificial Intelligence, 11(3), 417–445.
de Lara, J. R. C. (2005). A method of automatic speaker recognition using cepstral features and vectorial quantization, CIARP, LNCS 3773, pp. 146–153.
Durak, B. (2011) A classification algorithm using Mahalanobis distance clustering of data with applications on biomedical data sets, a thesis submitted to the graduate school of natural and Applied Science of Middle East Technical University.
Furui, S. (1981). Comparison of speaker recognition methods using statistical features and dynamic features. IEEE Transactions on Acoustics Speech and Signal Processing, 29(3), 342–350.
Hirst, D., & Di Cristo, A. (2000). Intonation systems. A survey of twenty languages (Vol. 76, no. 2, pp. 460–463). Cambridge: Cambridge University Press. Linguistic Society of America. doi:10.2307/417674.
Holmes, J., & Holmes, W. (2003). Introduction to Front-end Analysis for Automatic Speech Recognition 0.2ème edition, Speech Synthesis and Recognition. Chapter 10. Taylor and Francis e-Library.
Hossan, M. A., Memon, S., & Gregory, M. A. (2010). A novel approach for MFCC feature extraction. In Proceedings of the 4th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE. doi:10.1109/ICSPCS.2010.5709752.
Huang, C., Chen, G., Yu, H., Bao, Y., & Zhao, L. (2013). Speech emotion recognition under white noise. Archives of Acoustics, 38(4), 457–463.
Jamaati, M., Marvi, H., & Lankarany, M. (2008). Vowels recognition using mellin transform and PLP-based feature extraction. Journal of the Acoustical Society of America, 123(5), 3177.
Josse, V. (2003) Identification nommée du locuteur: Exploitation conjointe du signal sonore et de sa transcription. Thèse de doctorat, Ecole doctorale, Académie de Nantes. Université du Maine. France.
Lei, H. H. (2010). Structured approaches to data selection for speaker recognition, Technical report. UCB/EECS-2010-150. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-150.pdf. Accessed on 14 Jul 2017.
Lung, S. Y. (2010). Improved wavelet feature extraction using kernel analysis for text independent speaker recognition. Digital Signal Processing, 20, 1400–1407. doi:10.1016/j.dsp.2009.12.004.
Mahalanobis, P. C. (1936) On the generalised distance in statistics. In Proceedings of the National Institute of Sciences of India, 2(1), pp. 49–55.
McLachlan, G. J. (1999). Mahalanobis distance. Resonance, 4(6), 20–26.
Minh, N. D. (1996) An automatic speaker recognition system. white paper, Digital signal processing, Mini-project, Swiss federal Institute of Technology, Lausanne, Switzerland, pp. 1–14. Retrieved from http://www.codelooker.com/dfilec/7880ljsafasdf/asr_project.pdf.
Nehe, N. S., & Holambe, R. S. (2012). DWT and LPC based feature extraction methods for isolated word recognition. EURASIP Journal on Audio, Speech, and Music Processing. http://asmp.eurasipjournals.com/content/2012/1/7.
Pandiaraj, S., & Shankar Kumar, K. R. (2015). Speaker identification using discrete wavelet transform. Journal of Computer Science, 11(1), 53–56. doi:10.3844/jcssp.2015.53.56.
Parizeau, M. (2004). le perceptron multicouche et son algorithme de rétropropagation des erreurs, département de génie électrique et de génie informatique, Université de laval, 10 septembre. http://reussirlem1info.files.wordpress.com/2012/05/mlp.pdf.
Rishiraj, M. (2012) Speaker recognition using shifted MFCC, Graduate theses and dissertations. University of South Florida. http://scholarcommons.usf.edu/etd/4136/.
Sabitha, V, & Janardhanan, P. Speaker verification system using MFCC and DWT. IOSR Journal of Electronics & Communication Engineering (IOSR-JECE), pp. 24–29. ISSN (e): 2278–1684 ISSN(p): 2320-334X.
Saeed, K., & Kheir Nammous, M. (2007). A speech-and-speaker identification system: Feature extraction, description, and classification of speech-signal image. IEEE Transactions on Industrial Electronics, 54(2), 887–897.
Satori, H., & ElHaoussi, F. (2014). Investigation Amazigh speech recognition using CMU tools. International Journal of Speech Technology, 17(3), 235–243.
Senthil Raja, G., & Dandapat, S. (2010). Speaker recognition under stressed condition. International Journal of Speech Technology, 13(3), 141–161. doi:10.1007/s10772-010-9075-z.
Srinivas, V., Santhi rani, Ch., & Madhu, T. (2014). Neural network based classification for speaker identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(1), 109–120.
Tanprasert, C., Wutiwiwatchai, C., & Sae-tang, S. (2000). Text-dependent speaker identification using neural network on distinctive Thai tone marks. Technical Journal, 1(6), 249–253.
Theodoridis, S., & Koutroumbas, K. (2003). Pattern recognition (2nd ed.). London: Academic Press. eBook ISBN: 9780080949123.
Toutios, A., & Margaritis, K. G. (2002). Development of a text-dependent speaker identification system with the OGI Toolkit. In second hellenic conference on Al, SETN-2002, Thessaloniki, Greece, Proceeding, Companion Volume, pp. 525–530.
Zhao, X., Wang, Y., & Wang, D. (2014). Robust speaker identification in noisy and reverberant conditions. IEEE Transactions on Audio, Speech, and Language Processing, 22(4), 836–845.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chelali, F.Z., Djeradi, A. Text dependant speaker recognition using MFCC, LPC and DWT. Int J Speech Technol 20, 725–740 (2017). https://doi.org/10.1007/s10772-017-9441-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-017-9441-1