Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

Yessad, Dalila; Amrouche, Abderrahmane

doi:10.1007/s10772-013-9204-6

Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

Published: 31 July 2013

Volume 17, pages 43–51, (2014)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dalila Yessad¹ &
Abderrahmane Amrouche¹

381 Accesses
Explore all metrics

Abstract

A novel approach, based on robust regression with normalized score fusion (namely Normalized Scores following Robust Regression Fusion: NSRRF), is proposed for enhancement of speaker recognition over IP networks, which can be used both in Network Speaker Recognition (NSR) and Distributed Speaker Recognition (DSR) systems. In this framework, it is basically assumed that the speech must be encoded by G729 coder in client side, and then, transmitted at a server side, where the ASR systems are located. The Universal Background Gaussian Mixture Model (GMM-UBM) and Gaussian Supervector (GMM-SVM) with normalized scores are used for speaker recognition. In this work, Mel Frequency Cepstral Coefficient (MFCC) and Linear Prediction Cepstral Coefficient (LPCC), both of these features are derived from Line Spectral Pairs (LSP) extracted from G729 bit-stream over IP, constitute the features vectors. Experimental results, conducted with the LIA SpkDet system based on the ALIZE platform3 using ARADIGITS database, have shown in first that the proposed method using features extracted directly from G729 bit-stream reduces significantly the error rate and outperforms the baseline system in ASR over IP based on the resynthesized (reconstructed) speech obtained from the G729 decoder. In addition, the obtained results show that the proposed approach, based on scores normalization following robust regression fusion technique, achieves the best result and outperform the conventional ASR over IP network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Closed-set speaker identification using VQ and GMM based models

Article 17 September 2021

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

Article 02 March 2019

VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Aggarwal, C., Olshefski, D., Saha, D., Shae, Z. Y., & Yu, P. C. SR. (2005). Speaker recognition from compressed VoIP packet stream. In IEEE international conference on multimedia and expo, Amsterdam, Netherlands (pp. 970–973).
Google Scholar
Amrouche, A., Debyeche, M., Taleb Ahmed, A., Rouvaen, J. M., & Yagoub, M. C. E. (2010). Efficient system for speech recognition in adverse conditions using nonparametric regression. Engineering Applications of Artificial Intelligence, 23(1), 85–94.
Article Google Scholar
Barras, C., & Gauvain, J.L. (2003). Feature and score normalization for speaker verification of cellular data. In 2003 IEEE international conference on acoustics, speech and signal processing, Hong Kong, China (pp. 49–52).
Google Scholar
Bonastre, J. F., Wils, F., & Meignier, S. (2005). ALIZE, a free toolkit for speaker recognition. In IEEE international conference on acoustics, speech and signal processing, Philadelphia, USA (pp. 737–740).
Google Scholar
Campbell, W. M. (2002). Generalized linear discriminant sequence kernels for speaker recognition. In IEEE international conference on acoustics speech and signal processing, Orlando, USA (pp. 161–164).
Google Scholar
Campbell, W., Sturim, D., Reynolds, D. A., & Solomonoff, A. (2006). SVM based speaker verification using a GMM supervector kernel and Nap variability compensation. In IEEE international conference on acoustics, speech and signal processing, Toulouse, France (pp. 97–100).
Google Scholar
Carmona, J. L., Peinado, A. M., Perez-Cordoba, J. L., & Gomez, A. M. (2010). MMSE-Based packet loss concealment for CELP-coded speech recognition. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1341–1353.
Article Google Scholar
Chen, K. (2003). Towards better making a decision in speaker verification. Pattern Recognition, 36(2), 329–349.
Article Google Scholar
Do, M. N. (2003). Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models. IEEE Signal Processing Letters, 10(4), 115–118.
Article MathSciNet Google Scholar
Fakhr, W., AbdelSalam, A., & Hamdy, N. (2004). Enhancement of mismatched conditions in speaker recognition of multimedia applications. In IEEE international conference on acoustics, speech and signal processing, Montréal, Canada (pp. 377–380).
Google Scholar
Huber, P. J. (1964). Robust estimation of a location parameter. Annals of Mathematical Statistics, 35, 3–73.
Article Google Scholar
ITU-T (1996) Recommendation G.729-coding of speech at 8 kbit/s using conjugate-structure. Algebraic-code-excited linear-prediction (CS-ACELP).
Jain, A., Nandakumar, K., & Ross, A. (2005). Score normalization in multimodal biometric systems. Pattern Recognition, 38(12), 2270–2285.
Article Google Scholar
Khan, L. A., Baig, M. S., & Youssef Amr, M. (2009). Speaker recognition from encrypted VoIP communications. Digital Investigation, 7(1–2), 65–73.
Google Scholar
Karam, Z. N., & Campbell, W. M. (2008). A multi-class MLLR kernel for SVM speaker recognition. In IEEE international conference on acoustics, speech and signal processing, Las Vegas, USA (pp. 4117–4120).
Google Scholar
Kim, H. K., & Cox, R. V. (2001). A bits-stream-based front-end for wireless speech recognition on is-136 communications system. IEEE Transactions on Speech and Audio Processing, 9(5), 558–568.
Article Google Scholar
Limin, N., Xuan, W., Xiaorong, Y., & Jiancheng, L. (2009). The implementation of speaker recognition on VoIP auditing in gigabit high-speed environment. In International workshop on information security and application, Qingdao, China (pp. 396–400).
Google Scholar
Linguistic Data Consortium, (1996–1999) NIST speaker recognition benchmarks. http://www.ldc.upenn.edu;
Moreno, P. J., Ho, P., & Vasconcelos, N. A. (2004). Kullback-Leibler divergence based kernel for SVM classification in multimedia applications. Advances in Neural Information Processing Systems, 16, 1385–1393.
Google Scholar
Moreno-Daniel, A., Juang, B. H., & Nolazco Flores, J. A. (2005). Robustness of bit-stream based features for speaker verification. In IEEE international conference on acoustics, speech, and signal processing, Philadelphia, USA (pp. 749–752).
Google Scholar
Nandakumar, K., Chen, Y., Dass, S. C., & Jain, A. K. (2008). Likelihood ratio-based biometric score fusion. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 342–347.
Article Google Scholar
Petracca, M., Servetti, A., & De Martin, J. C. (2005). Low-complexity automatic speaker recognition in the compressed GSM-AMR domain. In IEEE international conference on multimedia and expo, Amsterdam, Netherlands (pp. 662–665).
Google Scholar
Poh, N., & Kittler, J. (2008). Incorporating variation of model-specific score distribution in speaker verification systems. IEEE Transactions on Audio, Speech, and Language Processing, 16(3), 594–606.
Article Google Scholar
Quatieri, T. F., Singer, E., Dunn, R. B., Reynolds, D. A., & Campbell, J. P. (1999). Speaker and language recognition using speech codec parameters. In Eurospeech’99, Budapest, Hungary (pp. 787–790).
Google Scholar
Quatieri, T. F., Dunn, R. B., Reynolds, D. A., Campbell, J. P., & Singer, E. (2000). Speaker recognition using G.729 speech codec parameters. In IEEE international conference on acoustics, speech, and signal processing, Turkey (pp. 1089–1092).
Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10(1–3), 19–41.
Article Google Scholar
Rosenberg, A. E., Delong, J., Lee, C. H., Juang, B. H., & Soong, F. K. (1992). The use of cohort normalized scores for speaker recognition. In International conference spoken language (pp. 599–602).
Google Scholar
Turunen, J., & Valj, D. (2001). A study of speech coding parameters in speech recognition. In Eurospeech’01, Aalborg, Denmark (pp. 2363–2366).
Google Scholar
Wan, V., & Renals, S. (2003). SVMSVM: support vector machine speaker verification methodology. In IEEE international conference on acoustics, speech, and signal processing proceedings, Hong Kong, China (pp. 221–224).
Google Scholar
Yessad, D., & Amrouche, A. (2012). G729 coded parameters under matched and mismatched conditions for distributed speaker recognition. In International Congress on Telecommunication and Application’12, Bejaia, Algeria.
Google Scholar
Yessad, D., Amrouche, A., & Debyeche, M. (2011). Influence of G729 speech coding on automatic speaker recognition in VoIP applications. In Lectures notes in electrical engineering: Vol. 114. The 2011 computer science and convergence (pp. 745–751). Berlin: Springer.
Chapter Google Scholar
Yu, E. W. M., Mak, M. W., Sit, C. H., & Kung, S. Y. (2003). Speaker verification based on G.729 and G.723.1 coder parameters and handset mismatch compensation. In Eurospeech’03, Geneva, Switzerland (pp. 1681–1684).
Google Scholar
Zhaopin, S., Jianguo, J., Shiguo, L., Guofu, Z., & Donghui, H. (2010). Hierarchical selective encryption for G.729 speech based on bit sensitivity. Journal of Internet Technology, 11(5), 599–608.
Google Scholar

Download references

Author information

Authors and Affiliations

LCPTS, Speech Communication and Signal Processing Lab., Faculty of Electronics and Computer Sciences, USTHB, P.O. Box 32 El Alia, Bab Ezzouar, 16111, Alger
Dalila Yessad & Abderrahmane Amrouche

Authors

Dalila Yessad
View author publications
You can also search for this author inPubMed Google Scholar
Abderrahmane Amrouche
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Abderrahmane Amrouche.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yessad, D., Amrouche, A. Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP. Int J Speech Technol 17, 43–51 (2014). https://doi.org/10.1007/s10772-013-9204-6

Download citation

Received: 16 February 2013
Accepted: 24 June 2013
Published: 31 July 2013
Issue Date: March 2014
DOI: https://doi.org/10.1007/s10772-013-9204-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust regression fusion of GMM-UBM and GMM-SVM normalized scores using G729 bit-stream for speaker recognition over IP

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Closed-set speaker identification using VQ and GMM based models

Multitaper MFCC and normalized multitaper phase-based features for speaker verification

VQ/GMM-Based Speaker Identification with Emphasis on Language Dependency

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now