Skip to main content
Log in

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The sparse representation classification (SRC) has attracted the attention of many signal processing domains in past few years. Recently, it has been successfully explored for the speaker recognition task with Gaussian mixture model (GMM) mean supervectors which are typically of the order of tens of thousands as speaker representations. As a result of this, the complexity of such systems become very high. With the use of the state-of-the-art i-vector representations, the dimension of GMM mean supervectors can be reduced effectively. But the i-vector approach involves a high dimensional data projection matrix which is learned using the factor analysis approach over huge amount of data from a large number of speakers. Also, the estimation of i-vector for a given utterance involves a computationally complex procedure. Motivated by these facts, we explore the use of data-independent projection approaches for reducing the dimensionality of GMM mean supervectors. The data-independent projection methods studied in this work include a normal random projection and two kinds of sparse random projections. The study is performed on SRC-based speaker identification using the NIST SRE 2005 dataset which includes channel matched and mismatched conditions. We find that the use of data-independent random projections for the dimensionality reduction of the supervectors results in only 3 % absolute loss in performance compared to that of the data-dependent (i-vector) approach. It is highlighted that with the use of highly sparse random projection matrices having \(\pm \)1 as non-zero coefficients, a significant reduction in computational complexity is achieved in finding the projections. Further, as these matrices do not require floating point representations, their storage requirement is also very small compared to that of the data-dependent or the normal random projection matrices. These reduced complexity sparse random projections would be of interest in context of the speaker recognition applications implemented on platforms having low computational power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. D. Achlioptas, Database-friendly random projections, in ACM Symposium on Principles of Database Systems, pp. 274–281 (2001)

  2. M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)

    Article  Google Scholar 

  3. R. Auckenthaler, M. Carey, H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems. Digital Signal Process. 10(1–3), 42–54 (2000)

    Article  Google Scholar 

  4. V. Boominathan, K. Sri Rama Murty, Speaker recognition via sparse representations using orthogonal matching pursuit, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4381–4384 (2012)

  5. N. Brummer, L. Burget, J. Cernocky, O. Glembek, F. Grezl, M. Karafiat, D. van Leeuwen, P. Matejka, P. Schwarz, A. Strasheim, Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)

    Article  Google Scholar 

  6. W. Campbell, D. Sturim, D. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. I-I (2006)

  7. W.M. Campbell, D.E. Sturim, D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13, 308–311 (2006)

    Article  Google Scholar 

  8. S. Dasgupta, A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma. Random Struct. Algorithms 22(1), 60–65 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  9. N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  10. D.L. Donoho, For most large under-determined systems of equations, the minimal \(l\_1\)-norm near-solution approximates the sparsest near-solution (Stanford University, Tech. rep., 2004)

  11. D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)

    Article  MathSciNet  Google Scholar 

  13. K. Engan, S. Aase, J. Hakon Husoy, Method of optimal directions for frame design, in 1999 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 2443–2446 (1999)

  14. D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)

    Google Scholar 

  15. P. Georgiev, F. Theis, A. Cichocki, H. Bakardjian, Sparse component analysis: a new tool for data mining, in Data Mining in Biomedicine, Springer Optimization and Its Applications, ed. by P. Pardalos, V. Boginski, A. Vazacopoulos (Springer, US, 2007), pp. 91–116

  16. O. Glembek, L. Burget, P. Matejka, M. Karafiat, P. Kenny, Simplification and optimization of i-vector extraction, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)

  17. B.C. Haris, R. Sinha, Exploring sparse representation classification for speaker verification in realistic environment, in Centenary Conference, Electrical Engineering (Indian Institute of Science, Bangalore, 2011)

  18. B.C. Haris, R. Sinha, On exploring the similarity and fusion of i-vector and sparse representation based speaker verification systems, in Odyssey 2012, The Speaker and Language Recognition Workshop (2012)

  19. B.C. Haris, R. Sinha, Sparse representation of total variability smoothed GMM mean supervectors for speaker verification, in 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012a)

  20. B.C. Haris, R. Sinha, Sparse representation over learned and discriminatively learned dictionaries for speaker verification, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4785–4788 (2012b)

  21. B.C. Haris, R. Sinha, Speaker verification using sparse representation over KSVD learned dictionary, in 2012 National Conference on Communications (NCC) pp. 1–5 (2012c)

  22. K. Huang, S. Aviyente, Sparse representation for signal classification, in Neural Information Processing Systems (NIPS) (2006)

  23. S. Kaski, Dimensionality reduction by random mapping: fast similarity computation for clustering, in 1998 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 413–418 (1998)

  24. P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)

    Article  Google Scholar 

  25. P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Speaker and session variability in GMM based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)

    Article  Google Scholar 

  26. T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)

    Article  Google Scholar 

  27. J.M.K. Kua, E. Ambikairajah, J. Epps, R. Togneri, Speaker verification using sparse representation classification, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4548–4551 (2011)

  28. P. Li, T.J. Hastie, K.W. Church, Very sparse random projections, in 12th International Conference on Knowledge Discovery and Data Mining, pp. 287–296 (2006)

  29. Y. Li, A. Ngom, Supervised dictionary learning via non-negative matrix factorization for classification, in 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 439–443 (2012)

  30. M. Li, X. Zhang, Y. Yan, S. Narayanan, Speaker verification using sparse representations on total variability i-vectors. Interspeech 2011, 4548–4551 (2011)

    Google Scholar 

  31. I. Naseem, R. Togneri, M. Bennamoun, Sparse representation for speaker identification, in 2010 International Conference on Pattern Recognition (ICPR), pp. 4460–4463 (2010)

  32. NIST 2005 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig//tests/spk/2005/sre-05_evalplan-v6.pdf

  33. NIST speaker recognition evaluations. www.itl.nist.gov/iad/mig//tests/spk/

  34. S.J.D. Prince, J.H. Elder, Probabilistic linear discriminant analysis for inferences about identity, in 2007 International Conference on Computer Vision, pp. 1–8 (2007)

  35. D. Reynolds, Channel robust speaker verification via feature mapping, in 2003 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. II-53-6 (2003)

  36. D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)

    Article  Google Scholar 

  37. R. Rubinstein, M. Zibulevsky, M. Elad, Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit (Tech. rep, Technion, 2008)

  38. R. Rubinstein, A. Bruckstein, M. Elad, Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)

    Article  Google Scholar 

  39. C. Sigg, T. Dikk, J. Buhmann, Speech enhancement with sparse coding in learned dictionaries, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4758–4761 (2010)

  40. A. Solomonoff, W. Campbell, I. Boardman, Advances in channel compensation for SVM speaker recognition, in 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 629–632 (2005)

  41. R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 495–498 (2000)

  42. J. Wright, A. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)

    Article  Google Scholar 

  43. S.C. Yin, R. Rose, P. Kenny, A joint factor analysis approach to progressive model adaptation in text-independent speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(7), 1999–2010 (2007)

    Article  Google Scholar 

  44. H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)

    Article  MATH  Google Scholar 

  45. J. Zepeda, C. Guillemot, E. Kijak, Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE J. Sel. Top. Signal Process. 5(5), 1061–1073 (2011)

    Article  Google Scholar 

  46. M. Zibulevsky, B.A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary. Neural Comput. 13(4), 863–882 (2001)

    Google Scholar 

Download references

Acknowledgments

This work has been supported by the ongoing project “Development of speech based multilevel authentication system” sponsored by the Department of Information Technology, Government of India. The first author thanks the Linguistic Data Consortium (LDC) for providing access to the NIST SRE-2005 database through the Fall-2011 Database Scholarship award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to B. C. Haris.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haris, B.C., Sinha, R. Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification. Circuits Syst Signal Process 33, 2521–2538 (2014). https://doi.org/10.1007/s00034-014-9757-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-014-9757-x

Keywords

Navigation