New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition

Zergat, Kawthar Yasmine; Amrouche, Abderrahmane

doi:10.1007/s10772-014-9235-7

New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition

Published: 14 May 2014

Volume 17, pages 373–381, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Kawthar Yasmine Zergat¹ &
Abderrahmane Amrouche¹

637 Accesses
11 Citations
Explore all metrics

Abstract

Most of the existing speaker recognition systems are based on the basic GMM, the state of the art GMM-UBM, the SVM or more recently the GMM-SVM modeling. In this paper, a new scheme for Automatic Speaker Recognition (ASR), namely GMM-PCA-SVM, is presented. Dimensionality reduction using Principal Component Analysis (PCA) technique, which was previously applied in the front-end process, is now incorporated in the core of the GMM-SVM modeling part, in order to reduce the size of the adapted means vectors issued from the Universal Background Model (UBM). A Comparative study, using Mel Frequency Cepstral Coefficients (MFCC) with Cepstral Mean Subtraction (CMS) extracted from the TIMIT database is performed for speaker recognition in clean and noisy environments. It is shown that the proposed scheme is a promising way for the ASR task. In fact, the recognition performances using GMM-PCA-SVM proposed method is significantly improved compared to the conventional SVM or GMM-SVM based systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 1–47.
Article Google Scholar
Campbell, W., Sturim, D., Reynolds, D. A., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and Nap variability compensation. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France (pp. 97–100).
Campbell, J. P, Jr. (1997). Speaker recognition: a tutorial. In Pro. IEEE, 85(9), 1437–1462.
Chitturi, R., & Hansen, J. H. L. (2007). Multi-stream dialect classification using SVM-GMM hybrid classifiers. In IEEE Workshop on Automatic Speech Recognition & Understanding, Kyoto, Japan (pp. 431–436).
Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallett, D., Dahlgren, N., et al. (1993). TIMIT acoustic-phonetic continuous speech corpus. Philadelphia: Linguistic Data Consortium.
Google Scholar
Hanilci, C., & Ertas, F. (2011). VQ-UBM based speaker verification through dimension reduction using local PCA, In 19th European Signal Processing conference, Spain (pp. 1303–1306).
Harrag, A., Mohamadi, T., & Harrag N. (2011). LDA fusing of acoustic and prosodic features: application to speaker recognition. In Colloquium on Humanities, Science and Engineering Research, Penang (pp. 245–248).
Izquierdo-Verdiguier, E., Gomez-Chova, L., Bruzzone, L., & Camps-Valls, G. (2014). Semisupervised kernel feature extraction for remote sensing image analysis. IEEE transactions on geoscience and remote sensing, PP(99), 1–12.
Jiang, J., Wu, Z., Xu, M., Jia, J., & Cai, L. (2013). Comparing feature dimension reduction algorithms for GMM-SVM based speech emotion recognition, In Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, Kaohsiung, China (pp. 1–4).
Jokic, I., Jokic, S., Gnjatovic, M., Delic, V., & Peric, Z. (2012). Influence of the number of principal components used to the automatic speaker recognition accuracy. Electronics & Electrical Engineering, 123, 83–86.
Google Scholar
Jolliffe, I. T. (2010). Principal component analysis (2nd ed.). New York, NY: Springer-Verlag.
Google Scholar
Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, USA (pp. 4117–4120).
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.
Article Google Scholar
Kresimir delac, K., Grgic, M., & liatsis, P. (2005). Appearance based statistical methods for face recognition. In 47th international symposium EL-MAR, Zadar, Croatia (pp. 151–158).
Kuncheva, L. I., & Faithfull, W. J. (2014). PCA feature extraction for change detection in multidimensional unlabeled data. IEEE transactions on neural networks and learning systems, 25(1), 69–80.
Lee, K. Y. (2004). Local fuzzy PCA based GMM with dimension reduction on speaker identification. Pattern Recognition Letters, 25, 1811–1817.
Article Google Scholar
Li, H., & Dong, Y. (2013). EigenVoice used in speaker recognition with a few training samples. Advanced Materials Research, 823, 618–621.
Article Google Scholar
Malarvizhi, A., & Sivasarathadevi, K. (2013). Performance analysis of HDM and PCA, ICA In teeth image recognition, In Proceedings of International Conference on Optical Imaging Sensor and Security, Coimbatore, India (pp. 1–5).
Minkyung, K., Eunyoung, K., Changwoo, S., & Sungchae, J. (2010). Speaker verification and identification using principal component analysis based on global eigenvector matrix. Hybrid Artificial Intelligence Systems, 6076, 278–285.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Vapnik, V. (1998). Statistical learning theory. New York: John Wiley.
MATH Google Scholar
Wan, V., & Renals, S. (2003). SVMSVM: support vector machine speaker verification methodology. In IEEE International Conference on Acoustics, Speech, and Signal Processing Proceedings (ICASSP), Hong Kong, China (pp. 221–224).
Yun, L., & Hansen, J.H.L. (2009). Factor analysis-based information integration for Arabic dialect identification. In IEEE International Conference on Acoustics, Speech and Signal Processing, Taipei, Taïwan (pp. 4337–4340).
Yun, L., & Hansen, J. H. L. (2011). Dialect classification via text-independent training and testing for Arabic, Spanish, and Chinese. Audio, Speech, and Language Processing, IEEE Transactions on Biometrics Compendium, 19(1), 85–96.
Article Google Scholar
Zhang, C., & Zheng, T.F. (2013). A fishervoice based feature fusion method for short utterance speaker recognition. In 2013 IEEE China Summit and International Conference on Signal and Information Processing, Beijing, China (pp. 165–169).

Download references

Author information

Authors and Affiliations

Speech Com. & Signal Proc. Lab.-LCPTS, Faculty of Electronics and Computer Sciences, USTHB, Bab Ezzouar, 16 111, Algeria
Kawthar Yasmine Zergat & Abderrahmane Amrouche

Authors

Kawthar Yasmine Zergat
View author publications
You can also search for this author in PubMed Google Scholar
Abderrahmane Amrouche
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kawthar Yasmine Zergat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zergat, K.Y., Amrouche, A. New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition. Int J Speech Technol 17, 373–381 (2014). https://doi.org/10.1007/s10772-014-9235-7

Download citation

Received: 19 October 2013
Accepted: 21 April 2014
Published: 14 May 2014
Issue Date: December 2014
DOI: https://doi.org/10.1007/s10772-014-9235-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New scheme based on GMM-PCA-SVM modelling for automatic speaker recognition

Abstract

Access this article

Similar content being viewed by others

Speech Emotion Recognition: A Comprehensive Survey

Milestones in speaker recognition

Chinese dialect speech recognition: a comprehensive survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation