Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

Haris, B. C.; Sinha, Rohit

doi:10.1007/s00034-014-9757-x

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

Published: 01 April 2014

Volume 33, pages 2521–2538, (2014)
Cite this article

Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

B. C. Haris¹ &
Rohit Sinha¹

349 Accesses
4 Citations
Explore all metrics

Abstract

The sparse representation classification (SRC) has attracted the attention of many signal processing domains in past few years. Recently, it has been successfully explored for the speaker recognition task with Gaussian mixture model (GMM) mean supervectors which are typically of the order of tens of thousands as speaker representations. As a result of this, the complexity of such systems become very high. With the use of the state-of-the-art i-vector representations, the dimension of GMM mean supervectors can be reduced effectively. But the i-vector approach involves a high dimensional data projection matrix which is learned using the factor analysis approach over huge amount of data from a large number of speakers. Also, the estimation of i-vector for a given utterance involves a computationally complex procedure. Motivated by these facts, we explore the use of data-independent projection approaches for reducing the dimensionality of GMM mean supervectors. The data-independent projection methods studied in this work include a normal random projection and two kinds of sparse random projections. The study is performed on SRC-based speaker identification using the NIST SRE 2005 dataset which includes channel matched and mismatched conditions. We find that the use of data-independent random projections for the dimensionality reduction of the supervectors results in only 3 % absolute loss in performance compared to that of the data-dependent (i-vector) approach. It is highlighted that with the use of highly sparse random projection matrices having \(\pm \)1 as non-zero coefficients, a significant reduction in computational complexity is achieved in finding the projections. Further, as these matrices do not require floating point representations, their storage requirement is also very small compared to that of the data-dependent or the normal random projection matrices. These reduced complexity sparse random projections would be of interest in context of the speaker recognition applications implemented on platforms having low computational power.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Article 04 May 2017

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

Article 20 July 2017

PLDA in the i-supervector space for text-independent speaker verification

Article Open access 15 July 2014

References

D. Achlioptas, Database-friendly random projections, in ACM Symposium on Principles of Database Systems, pp. 274–281 (2001)
M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
Article Google Scholar
R. Auckenthaler, M. Carey, H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems. Digital Signal Process. 10(1–3), 42–54 (2000)
Article Google Scholar
V. Boominathan, K. Sri Rama Murty, Speaker recognition via sparse representations using orthogonal matching pursuit, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4381–4384 (2012)
N. Brummer, L. Burget, J. Cernocky, O. Glembek, F. Grezl, M. Karafiat, D. van Leeuwen, P. Matejka, P. Schwarz, A. Strasheim, Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)
Article Google Scholar
W. Campbell, D. Sturim, D. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. I-I (2006)
W.M. Campbell, D.E. Sturim, D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13, 308–311 (2006)
Article Google Scholar
S. Dasgupta, A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma. Random Struct. Algorithms 22(1), 60–65 (2003)
Article MATH MathSciNet Google Scholar
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
D.L. Donoho, For most large under-determined systems of equations, the minimal \(l\_1\)-norm near-solution approximates the sparsest near-solution (Stanford University, Tech. rep., 2004)
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MATH MathSciNet Google Scholar
M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
Article MathSciNet Google Scholar
K. Engan, S. Aase, J. Hakon Husoy, Method of optimal directions for frame design, in 1999 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 2443–2446 (1999)
D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)
Google Scholar
P. Georgiev, F. Theis, A. Cichocki, H. Bakardjian, Sparse component analysis: a new tool for data mining, in Data Mining in Biomedicine, Springer Optimization and Its Applications, ed. by P. Pardalos, V. Boginski, A. Vazacopoulos (Springer, US, 2007), pp. 91–116
O. Glembek, L. Burget, P. Matejka, M. Karafiat, P. Kenny, Simplification and optimization of i-vector extraction, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
B.C. Haris, R. Sinha, Exploring sparse representation classification for speaker verification in realistic environment, in Centenary Conference, Electrical Engineering (Indian Institute of Science, Bangalore, 2011)
B.C. Haris, R. Sinha, On exploring the similarity and fusion of i-vector and sparse representation based speaker verification systems, in Odyssey 2012, The Speaker and Language Recognition Workshop (2012)
B.C. Haris, R. Sinha, Sparse representation of total variability smoothed GMM mean supervectors for speaker verification, in 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012a)
B.C. Haris, R. Sinha, Sparse representation over learned and discriminatively learned dictionaries for speaker verification, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4785–4788 (2012b)
B.C. Haris, R. Sinha, Speaker verification using sparse representation over KSVD learned dictionary, in 2012 National Conference on Communications (NCC) pp. 1–5 (2012c)
K. Huang, S. Aviyente, Sparse representation for signal classification, in Neural Information Processing Systems (NIPS) (2006)
S. Kaski, Dimensionality reduction by random mapping: fast similarity computation for clustering, in 1998 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 413–418 (1998)
P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
Article Google Scholar
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Speaker and session variability in GMM based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)
Article Google Scholar
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
J.M.K. Kua, E. Ambikairajah, J. Epps, R. Togneri, Speaker verification using sparse representation classification, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4548–4551 (2011)
P. Li, T.J. Hastie, K.W. Church, Very sparse random projections, in 12th International Conference on Knowledge Discovery and Data Mining, pp. 287–296 (2006)
Y. Li, A. Ngom, Supervised dictionary learning via non-negative matrix factorization for classification, in 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 439–443 (2012)
M. Li, X. Zhang, Y. Yan, S. Narayanan, Speaker verification using sparse representations on total variability i-vectors. Interspeech 2011, 4548–4551 (2011)
Google Scholar
I. Naseem, R. Togneri, M. Bennamoun, Sparse representation for speaker identification, in 2010 International Conference on Pattern Recognition (ICPR), pp. 4460–4463 (2010)
NIST 2005 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig//tests/spk/2005/sre-05_evalplan-v6.pdf
NIST speaker recognition evaluations. www.itl.nist.gov/iad/mig//tests/spk/
S.J.D. Prince, J.H. Elder, Probabilistic linear discriminant analysis for inferences about identity, in 2007 International Conference on Computer Vision, pp. 1–8 (2007)
D. Reynolds, Channel robust speaker verification via feature mapping, in 2003 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. II-53-6 (2003)
D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
Article Google Scholar
R. Rubinstein, M. Zibulevsky, M. Elad, Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit (Tech. rep, Technion, 2008)
R. Rubinstein, A. Bruckstein, M. Elad, Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)
Article Google Scholar
C. Sigg, T. Dikk, J. Buhmann, Speech enhancement with sparse coding in learned dictionaries, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4758–4761 (2010)
A. Solomonoff, W. Campbell, I. Boardman, Advances in channel compensation for SVM speaker recognition, in 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 629–632 (2005)
R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 495–498 (2000)
J. Wright, A. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
Article Google Scholar
S.C. Yin, R. Rose, P. Kenny, A joint factor analysis approach to progressive model adaptation in text-independent speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(7), 1999–2010 (2007)
Article Google Scholar
H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)
Article MATH Google Scholar
J. Zepeda, C. Guillemot, E. Kijak, Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE J. Sel. Top. Signal Process. 5(5), 1061–1073 (2011)
Article Google Scholar
M. Zibulevsky, B.A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary. Neural Comput. 13(4), 863–882 (2001)
Google Scholar

Download references

Acknowledgments

This work has been supported by the ongoing project “Development of speech based multilevel authentication system” sponsored by the Department of Information Technology, Government of India. The first author thanks the Linguistic Data Consortium (LDC) for providing access to the NIST SRE-2005 database through the Fall-2011 Database Scholarship award.

Author information

Authors and Affiliations

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati , 781039, India
B. C. Haris & Rohit Sinha

Authors

B. C. Haris
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Sinha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to B. C. Haris.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Haris, B.C., Sinha, R. Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification. Circuits Syst Signal Process 33, 2521–2538 (2014). https://doi.org/10.1007/s00034-014-9757-x

Download citation

Received: 01 May 2013
Revised: 06 February 2014
Accepted: 07 February 2014
Published: 01 April 2014
Issue Date: August 2014
DOI: https://doi.org/10.1007/s00034-014-9757-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

Abstract

Access this article

Similar content being viewed by others

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

PLDA in the i-supervector space for text-independent speaker verification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification

Abstract

Access this article

Similar content being viewed by others

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Enhanced speaker verification using an adaptive multiple low-rank representation based on the modified adaptive Gaussian mixture model framework

PLDA in the i-supervector space for text-independent speaker verification

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation