Abstract
Most of the existing speaker recognition systems are based on the basic GMM, the state of the art GMM-UBM, the SVM or more recently the GMM-SVM modeling. In this paper, a new scheme for Automatic Speaker Recognition (ASR), namely GMM-PCA-SVM, is presented. Dimensionality reduction using Principal Component Analysis (PCA) technique, which was previously applied in the front-end process, is now incorporated in the core of the GMM-SVM modeling part, in order to reduce the size of the adapted means vectors issued from the Universal Background Model (UBM). A Comparative study, using Mel Frequency Cepstral Coefficients (MFCC) with Cepstral Mean Subtraction (CMS) extracted from the TIMIT database is performed for speaker recognition in clean and noisy environments. It is shown that the proposed scheme is a promising way for the ASR task. In fact, the recognition performances using GMM-PCA-SVM proposed method is significantly improved compared to the conventional SVM or GMM-SVM based systems.
Similar content being viewed by others
References
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1–47.
Campbell, W., Sturim, D., Reynolds, D. A., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and Nap variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France (pp. 97–100).
Campbell, J. P, Jr. (1997). Speaker recognition: a tutorial. In Pro. IEEE, 85(9), 1437–1462.
Chitturi, R., & Hansen, J. H. L. (2007). Multi-stream dialect classification using SVM-GMM hybrid classifiers. In IEEE Workshop on Automatic Speech Recognition & Understanding, Kyoto, Japan (pp. 431–436).
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Hanilci, C., & Ertas, F. (2011). VQ-UBM based speaker verification through dimension reduction using local PCA, In 19th European Signal Processing conference, Spain (pp. 1303–1306).
Harrag, A., Mohamadi, T., & Harrag N. (2011). LDA fusing of acoustic and prosodic features: application to speaker recognition. In Colloquium on Humanities, Science and Engineering Research, Penang (pp. 245–248).
Izquierdo-Verdiguier, E., Gomez-Chova, L., Bruzzone, L., & Camps-Valls, G. (2014). Semisupervised kernel feature extraction for remote sensing image analysis. IEEE transactions on geoscience and remote sensing, PP(99), 1–12.
Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition, In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, China (pp. 1–4).
Jokic, I., Jokic, S., Gnjatovic, M., Delic, V., & Peric, Z. (2012). Influence of the number of principal components used to the automatic speaker recognition accuracy. Electronics & Electrical Engineering, 123, 83–86.
Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York, NY: Springer-Verlag.
Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA (pp. 4117–4120).
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Kresimir delac, K., Grgic, M., & liatsis, P. (2005). Appearance based statistical methods for face recognition. In 47th international symposium EL-MAR, Zadar, Croatia (pp. 151–158).
Kuncheva, L. I., & Faithfull, W. J. (2014). PCA feature extraction for change detection in multidimensional unlabeled data. IEEE transactions on neural networks and learning systems, 25(1), 69–80.
Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.
Li, H., & Dong, Y. (2013). EigenVoice used in speaker recognition with a few training samples. Advanced Materials Research, 823, 618–621.
Malarvizhi, A., & Sivasarathadevi, K. (2013). Performance analysis of HDM and PCA, ICA In teeth image recognition, In Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, India (pp. 1–5).
Minkyung, K., Eunyoung, K., Changwoo, S., & Sungchae, J. (2010). Speaker verification and identification using principal component analysis based on global eigenvector matrix. Hybrid Artificial Intelligence Systems, 6076, 278–285.
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Vapnik, V. (1998). Statistical learning theory. New York: John Wiley.
Wan, V., & Renals, S. (2003). SVMSVM: support vector machine speaker verification methodology. In IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), Hong Kong, China (pp. 221–224).
Yun, L., & Hansen, J.H.L. (2009). Factor analysis-based information integration for Arabic dialect identification. In IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taïwan (pp. 4337–4340).
Yun, L., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. Audio, Speech, and Language Processing, IEEE Transactions on Biometrics Compendium, 19(1), 85–96.
Zhang, C., & Zheng, T.F. (2013). A fishervoice based feature fusion method for short utterance speaker recognition. In 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China (pp. 165–169).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zergat, K.Y., Amrouche, A. New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition. Int J Speech Technol 17, 373–381 (2014). https://doi.org/10.1007/s10772-014-9235-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-014-9235-7