Abstract
The sparse representation classification (SRC) has attracted the attention of many signal processing domains in past few years. Recently, it has been successfully explored for the speaker recognition task with Gaussian mixture model (GMM) mean supervectors which are typically of the order of tens of thousands as speaker representations. As a result of this, the complexity of such systems become very high. With the use of the state-of-the-art i-vector representations, the dimension of GMM mean supervectors can be reduced effectively. But the i-vector approach involves a high dimensional data projection matrix which is learned using the factor analysis approach over huge amount of data from a large number of speakers. Also, the estimation of i-vector for a given utterance involves a computationally complex procedure. Motivated by these facts, we explore the use of data-independent projection approaches for reducing the dimensionality of GMM mean supervectors. The data-independent projection methods studied in this work include a normal random projection and two kinds of sparse random projections. The study is performed on SRC-based speaker identification using the NIST SRE 2005 dataset which includes channel matched and mismatched conditions. We find that the use of data-independent random projections for the dimensionality reduction of the supervectors results in only 3 % absolute loss in performance compared to that of the data-dependent (i-vector) approach. It is highlighted that with the use of highly sparse random projection matrices having \(\pm \)1 as non-zero coefficients, a significant reduction in computational complexity is achieved in finding the projections. Further, as these matrices do not require floating point representations, their storage requirement is also very small compared to that of the data-dependent or the normal random projection matrices. These reduced complexity sparse random projections would be of interest in context of the speaker recognition applications implemented on platforms having low computational power.
Similar content being viewed by others
References
D. Achlioptas, Database-friendly random projections, in ACM Symposium on Principles of Database Systems, pp. 274–281 (2001)
M. Aharon, M. Elad, A. Bruckstein, K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 54(11), 4311–4322 (2006)
R. Auckenthaler, M. Carey, H. Lloyd-Thomas, Score normalization for text-independent speaker verification systems. Digital Signal Process. 10(1–3), 42–54 (2000)
V. Boominathan, K. Sri Rama Murty, Speaker recognition via sparse representations using orthogonal matching pursuit, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4381–4384 (2012)
N. Brummer, L. Burget, J. Cernocky, O. Glembek, F. Grezl, M. Karafiat, D. van Leeuwen, P. Matejka, P. Schwarz, A. Strasheim, Fusion of heterogeneous speaker recognition systems in the STBU submission for the NIST speaker recognition evaluation 2006. IEEE Trans. Audio Speech Lang. Process. 15(7), 2072–2084 (2007)
W. Campbell, D. Sturim, D. Reynolds, A. Solomonoff, SVM based speaker verification using a GMM supervector kernel and NAP variability compensation, in 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1, pp. I-I (2006)
W.M. Campbell, D.E. Sturim, D.A. Reynolds, Support vector machines using GMM supervectors for speaker verification. IEEE Signal Process. Lett. 13, 308–311 (2006)
S. Dasgupta, A. Gupta, An elementary proof of the Johnson–Lindenstrauss lemma. Random Struct. Algorithms 22(1), 60–65 (2003)
N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
D.L. Donoho, For most large under-determined systems of equations, the minimal \(l\_1\)-norm near-solution approximates the sparsest near-solution (Stanford University, Tech. rep., 2004)
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans. Image Process. 15(12), 3736–3745 (2006)
K. Engan, S. Aase, J. Hakon Husoy, Method of optimal directions for frame design, in 1999 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 5, pp. 2443–2446 (1999)
D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems. Interspeech 2011, 249–252 (2011)
P. Georgiev, F. Theis, A. Cichocki, H. Bakardjian, Sparse component analysis: a new tool for data mining, in Data Mining in Biomedicine, Springer Optimization and Its Applications, ed. by P. Pardalos, V. Boginski, A. Vazacopoulos (Springer, US, 2007), pp. 91–116
O. Glembek, L. Burget, P. Matejka, M. Karafiat, P. Kenny, Simplification and optimization of i-vector extraction, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4516–4519 (2011)
B.C. Haris, R. Sinha, Exploring sparse representation classification for speaker verification in realistic environment, in Centenary Conference, Electrical Engineering (Indian Institute of Science, Bangalore, 2011)
B.C. Haris, R. Sinha, On exploring the similarity and fusion of i-vector and sparse representation based speaker verification systems, in Odyssey 2012, The Speaker and Language Recognition Workshop (2012)
B.C. Haris, R. Sinha, Sparse representation of total variability smoothed GMM mean supervectors for speaker verification, in 2012 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5 (2012a)
B.C. Haris, R. Sinha, Sparse representation over learned and discriminatively learned dictionaries for speaker verification, in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4785–4788 (2012b)
B.C. Haris, R. Sinha, Speaker verification using sparse representation over KSVD learned dictionary, in 2012 National Conference on Communications (NCC) pp. 1–5 (2012c)
K. Huang, S. Aviyente, Sparse representation for signal classification, in Neural Information Processing Systems (NIPS) (2006)
S. Kaski, Dimensionality reduction by random mapping: fast similarity computation for clustering, in 1998 IEEE International Joint Conference on Neural Networks, vol. 1, pp. 413–418 (1998)
P. Kenny, G. Boulianne, P. Dumouchel, Eigenvoice modeling with sparse training data. IEEE Trans. Speech Audio Process. 13(3), 345–354 (2005)
P. Kenny, G. Boulianne, P. Ouellet, P. Dumouchel, Speaker and session variability in GMM based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)
T. Kinnunen, H. Li, An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
J.M.K. Kua, E. Ambikairajah, J. Epps, R. Togneri, Speaker verification using sparse representation classification, in 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4548–4551 (2011)
P. Li, T.J. Hastie, K.W. Church, Very sparse random projections, in 12th International Conference on Knowledge Discovery and Data Mining, pp. 287–296 (2006)
Y. Li, A. Ngom, Supervised dictionary learning via non-negative matrix factorization for classification, in 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 1, pp. 439–443 (2012)
M. Li, X. Zhang, Y. Yan, S. Narayanan, Speaker verification using sparse representations on total variability i-vectors. Interspeech 2011, 4548–4551 (2011)
I. Naseem, R. Togneri, M. Bennamoun, Sparse representation for speaker identification, in 2010 International Conference on Pattern Recognition (ICPR), pp. 4460–4463 (2010)
NIST 2005 speaker recognition evaluation plan. http://www.itl.nist.gov/iad/mig//tests/spk/2005/sre-05_evalplan-v6.pdf
NIST speaker recognition evaluations. www.itl.nist.gov/iad/mig//tests/spk/
S.J.D. Prince, J.H. Elder, Probabilistic linear discriminant analysis for inferences about identity, in 2007 International Conference on Computer Vision, pp. 1–8 (2007)
D. Reynolds, Channel robust speaker verification via feature mapping, in 2003 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. II-53-6 (2003)
D.A. Reynolds, T.F. Quatieri, R.B. Dunn, Speaker verification using adapted Gaussian mixture models. Digit. Signal Process. 10(1–3), 19–41 (2000)
R. Rubinstein, M. Zibulevsky, M. Elad, Efficient implementation of the K-SVD algorithm using batch orthogonal matching pursuit (Tech. rep, Technion, 2008)
R. Rubinstein, A. Bruckstein, M. Elad, Dictionaries for sparse representation modeling. Proc. IEEE 98(6), 1045–1057 (2010)
C. Sigg, T. Dikk, J. Buhmann, Speech enhancement with sparse coding in learned dictionaries, in 2010 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4758–4761 (2010)
A. Solomonoff, W. Campbell, I. Boardman, Advances in channel compensation for SVM speaker recognition, in 2005 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 629–632 (2005)
R. Teunen, B. Shahshahani, L. Heck, A model-based transformational approach to robust speaker recognition, in International Conference on Spoken Language Processing (ICSLP), vol. 2, pp. 495–498 (2000)
J. Wright, A. Yang, A. Ganesh, S. Sastry, Y. Ma, Robust face recognition via sparse representation. IEEE Trans. Pattern Anal. Mach. Intell. 31(2), 210–227 (2009)
S.C. Yin, R. Rose, P. Kenny, A joint factor analysis approach to progressive model adaptation in text-independent speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(7), 1999–2010 (2007)
H. Yu, J. Yang, A direct LDA algorithm for high-dimensional data with application to face recognition. Pattern Recogn. 34(10), 2067–2070 (2001)
J. Zepeda, C. Guillemot, E. Kijak, Image compression using sparse representations and the iteration-tuned and aligned dictionary. IEEE J. Sel. Top. Signal Process. 5(5), 1061–1073 (2011)
M. Zibulevsky, B.A. Pearlmutter, Blind source separation by sparse decomposition in a signal dictionary. Neural Comput. 13(4), 863–882 (2001)
Acknowledgments
This work has been supported by the ongoing project “Development of speech based multilevel authentication system” sponsored by the Department of Information Technology, Government of India. The first author thanks the Linguistic Data Consortium (LDC) for providing access to the NIST SRE-2005 database through the Fall-2011 Database Scholarship award.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Haris, B.C., Sinha, R. Exploring Data-Independent Dimensionality Reduction in Sparse Representation-Based Speaker Identification. Circuits Syst Signal Process 33, 2521–2538 (2014). https://doi.org/10.1007/s00034-014-9757-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00034-014-9757-x