Speaker Modeling Technique Based on Regression Class for Speaker Identification with Sparse Training

Fu, Zhonghua; Zhao, Rongchun

doi:10.1007/978-3-540-30548-4_70

Speaker Modeling Technique Based on Regression Class for Speaker Identification with Sparse Training

Zhonghua Fu²¹ &
Rongchun Zhao²¹

Conference paper

2203 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3338))

Abstract

Speaker modeling technique with sparse training data is an active branch of robust speaker recognition research. This paper presents a novel modeling approach named Multi-EigenSpace modeling technique based on Regression Class (RC-MES), which integrates the common eigenspace technique and the regression class (RC) idea of Maximum Likelihood Linear Regression (MLLR). RC-MES not only solves the problem of prior knowledge limitation of Gaussian Mixture Models (GMM) but also remedies the shortcoming of common eigenspace that confuses speaker differences and phoneme differences. The eigenvoice analysis in RC can provide better discrimination ability between different speakers. The experimental results on speaker identification of 75 males show that, when enrolment data is sparse, RC-MES provides significant improvement over GMM, and the number of eigenvoices in RC-MES is fewer than that in common eigenspace.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Reynolds, D.A.: Speaker identification and verification using Gaussian mixture speaker models. Speech Communication 17(1-2), 91–108 (1995)
Article Google Scholar
Thyes, O., Kuhn, R., Nguyen, P., Junqua, J.-C.: Speaker identification and verification using eigenvoices. In: ICSLP 2000, Beijing-China, vol. 2, pp. 242–246 (October 2000)
Google Scholar
Wang, N.J.-C., Tsai, W.-H., Lee, L.-S.: Eigen-MLLR coefficients as new feature parameters for speaker identification. Eurospeech 2, 1385–1388 (2001)
Google Scholar
Tadj, C., Gabrea, M., et al.: Towards robustness in speaker verification: enhancement and adapataion. In: The 2002 45th Midwest Symposium on Circuits and Systems, vol. 3, pp. 320–323 (August 2002)
Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of Continuous Density Hidden Markov Models. Computer Speech and Language 9, 171–185 (1995)
Article Google Scholar
Campbell Jr, J.P.: Speaker recognition: a tutorial. In: Proceedings of the IEEE, vol. 85(9) (September 1997)
Google Scholar
Kuhn, R., Junqua, J.-C., Nguyen, P., Niedzielski, N.: Rapid speaker adaptation in Eigenvoice space. IEEE Trans. On Speech and Audio Processing 8(6), 695–706 (2000)
Article Google Scholar
Young, S.J., Kershaw, D., Odell, J., Woodland, P.: The HTK Book (for HTK Version 3.0) (2000), http://htk.eng.cam.ac.uk/docs.shtml
Garofolo, J., et al.: DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus CD-ROM. National Institute of Standards and Technology (1993)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Northwestern, Polytechnical University, Xi’an, 710072, P.R. China
Zhonghua Fu & Rongchun Zhao

Authors

Zhonghua Fu
View author publications
You can also search for this author in PubMed Google Scholar
Rongchun Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Biometrics and Security Research & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences,
Stan Z. Li
Department of Electronics & Communication Engineering, Sun Yat-Sen University, Guangzhou, China
Jianhuang Lai
National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Center of Computer Vision, School of Mathematics and Computing Science, Sun Yat-sen University, 510275, Guangzhou, China
Guocan Feng
School of Computer Science and Engineering, Beihang University, Beijing, China
Yunhong Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, Z., Zhao, R. (2004). Speaker Modeling Technique Based on Regression Class for Speaker Identification with Sparse Training. In: Li, S.Z., Lai, J., Tan, T., Feng, G., Wang, Y. (eds) Advances in Biometric Person Authentication. SINOBIOMETRICS 2004. Lecture Notes in Computer Science, vol 3338. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30548-4_70

Download citation

DOI: https://doi.org/10.1007/978-3-540-30548-4_70
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-24029-7
Online ISBN: 978-3-540-30548-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics