Skip to main content
Log in

Speaker recognition using pyramid match kernel based support vector machines

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

Gaussian mixture model (GMM) based approaches have been commonly used for speaker recognition tasks. Methods for estimation of parameters of GMMs include the expectation-maximization method which is a non-discriminative learning based method. Discriminative classifier based approaches to speaker recognition include support vector machine (SVM) based classifiers using dynamic kernels such as generalized linear discriminant sequence kernel, probabilistic sequence kernel, GMM supervector kernel, GMM-UBM mean interval kernel (GUMI) and intermediate matching kernel. Recently, the pyramid match kernel (PMK) using grids in the feature space as histogram bins and vocabulary-guided PMK (VGPMK) using clusters in the feature space as histogram bins have been proposed for recognition of objects in an image represented as a set of local feature vectors. In PMK, a set of feature vectors is mapped onto a multi-resolution histogram pyramid. The kernel is computed between a pair of examples by comparing the pyramids using a weighted histogram intersection function at each level of pyramid. We propose to use the PMK-based SVM classifier for speaker identification and verification from the speech signal of an utterance represented as a set of local feature vectors. The main issue in building the PMK-based SVM classifier is construction of a pyramid of histograms. We first propose to form hard clusters, using k-means clustering method, with increasing number of clusters at different levels of pyramid to design the codebook-based PMK (CBPMK). Then we propose the GMM-based PMK (GMMPMK) that uses soft clustering. We compare the performance of the GMM-based approaches, and the PMK and other dynamic kernel SVM-based approaches to speaker identification and verification. The 2002 and 2003 NIST speaker recognition corpora are used in evaluation of different approaches to speaker identification and verification. Results of our studies show that the dynamic kernel SVM-based approaches give a significantly better performance than the state-of-the-art GMM-based approaches. For speaker recognition task, the GMMPMK-based SVM gives a performance that is better than that of SVMs using many other dynamic kernels and comparable to that of SVMs using state-of-the-art dynamic kernel, GUMI kernel. The storage requirements of the GMMPMK-based SVMs are less than that of SVMs using any other dynamic kernel.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Auckenthaler, R., Parris, E. S., & Carey, M. J. (1999). Improving a GMM speaker verification system by phonetic weighting. In Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP 1999), Phoenix, Arizona, USA, March 1999 (Vol. 1, pp. 313–316).

    Google Scholar 

  • Boughorbel, S., Tarel, J. -P., & Fleuret, F. (2004). Non-Mercer kernels for SVM object recognition. In Proceedings of British machine vision conference (BMVC 2004) (pp. 137–146).

    Google Scholar 

  • Boughorbel, S., Tarel, J. P., & Boujemaa, N. (2005). The intermediate matching kernel for image local features. In Proceedings of the international joint conference on neural networks, Montreal, Canada, July 2005 (pp. 889–894).

    Google Scholar 

  • Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.

    Article  Google Scholar 

  • Campbell, W., Assaleh, K., & Broun, C. (2002). Speaker recognition with polynomial classifiers. IEEE Transactions on Speech and Audio Processing, 10(4), 205–212.

    Article  Google Scholar 

  • Campbell, W. M., Campbell, J. P., Reynolds, D. A., Singer, E., & Torres-Carrasquillo, P. A. (2006a). Support vector machines for speaker and language recognition. Computer Speech & Language, 20(2–3), 210–229.

    Article  Google Scholar 

  • Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006b). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.

    Article  Google Scholar 

  • Chang, C. -C., & Lin, C. -J. (2011). LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2, 27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

    Google Scholar 

  • Dileep, A. D., & Sekhar, C. C. (2011). Speaker recognition using intermediate matching kernel based support vector machines. In A. Neustein & H. Patil (Eds.), Speaker forensics: new developments in voice technology to combat and detect threats to homeland security. Berlin: Springer.

    Google Scholar 

  • Grauman, K. L. (2006). Matching sets of features for efficient retrieval and recognition. PhD Thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, September 2006.

  • Grauman, K., & Darrell, T. (2007). The pyramid match kernel: efficient learning with sets of features. Journal of Machine Learning Research, 8, 725–760.

    MATH  Google Scholar 

  • Kailath, T. (1967). The divergence and Bhattacharyya distance measures in signal selection. IEEE Transactions on Communication Technology, 15(1), 52–60.

    Article  Google Scholar 

  • Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: from features to supervectors. Speech Communication, 52, 12–40.

    Article  Google Scholar 

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22(1), 79–86.

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, K.-A., You, C. H., Li, H., & Kinnunen, T. (2007). A GMM-based probabilistic sequence kernel for speaker verification. In Proceedings of INTERSPEECH, Antwerp, Belgium, August 2007 (pp. 294–297).

    Google Scholar 

  • Martin, A., Doddington, G., Kamm, T., Ordowski, M., & Przybocki, M. (1997). The DET curve in assessment of detection task performance. In Proceedings of EUROSPEECH (pp. 1895–1898).

    Google Scholar 

  • Newcombe, R. G. (1998). Two-sided confidence intervals for the single proportion: comparison of seven methods. Statistics in Medicine, 17(8), 857–872.

    Article  Google Scholar 

  • Reynolds, D. A. (1995). Speaker identification and verification using Gaussian mixture speaker models. Speech Communication, 17, 91–108.

    Article  Google Scholar 

  • Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.

    Article  Google Scholar 

  • Sha, F., & Saul, L. (2006). Large margin Gaussian mixture modeling for phonetic classification and recognition. In Proceedings of IEEE international conference on acoustics, speech and signal processing (ICASSP 2006), Toulouse, France, May 2006 (pp. 265–268).

    Google Scholar 

  • Swain, M. J., & Ballard, D. H. (1991). Color indexing. International Journal of Computer Vision, 7, 11–32.

    Article  Google Scholar 

  • The NIST year 2002 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/spk/2002/ (2002).

  • The NIST year 2003 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig/tests/sre/2003/ (2003).

  • Wallraven, C., Caputo, B., & Graf, A. (2003). Recognition with local features: the kernel recipe. In Proceedings of the ninth IEEE international conference on computer vision (ICCV 2003) (pp. 257–264).

    Chapter  Google Scholar 

  • Wan, V., & Renals, S. (2002). Evaluation of kernel methods for speaker verification and identification. In Proceedings of IEEE international conference on acoustics, speech and signal processing, Orlando, Florida, US, May 2002 (pp. 669–672).

    Google Scholar 

  • You, C. H., Lee, K. A., & Li, H. (2009). An SVM kernel with GMM-supervector based on the Bhattacharyya distance for speaker recognition. IEEE Signal Processing Letters, 16(1), 49–52.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. D. Dileep.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dileep, A.D., Sekhar, C.C. Speaker recognition using pyramid match kernel based support vector machines. Int J Speech Technol 15, 365–379 (2012). https://doi.org/10.1007/s10772-012-9154-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-012-9154-4

Keywords

Navigation