Speaker Recognition in Orthogonal Complement of Time Session Variability Subspace

Tsuge, Satoru; Kuroiwa, Shingo

doi:10.1007/978-3-319-92231-7_11

Speaker Recognition in Orthogonal Complement of Time Session Variability Subspace

Satoru Tsuge⁸ &
Shingo Kuroiwa⁹

Conference paper
First Online: 12 June 2018

837 Accesses

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 98))

Abstract

A time session variability between the enrollment data and the recognized data degrades speaker recognition performance. Hence, the time session variability is one of the most important issues in the speaker recognition technology. In this paper, we propose a robust speaker recognition method for the time session variability. The proposed method estimates a time session variability subspace. Then, the proposed method carries out the speaker recognition in the orthogonal complement of the time session variability subspace. In addition, we incorporate a linear discriminant analysis method into the proposed method. In order to evaluate the proposed method, we conducted a speaker identification experiment. Experimental results show that the proposed method improves speaker identification performance of baseline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Kinnunen, T., Li, H.: An overview of text-independent speaker recognition: from features to supervectors. Speech Commun. 52(1), 12–40 (2010)
Article Google Scholar
Matsui, T., Nishitani, T., Furui, S.: A study of model and a priori threshold updating in speaker verification. IEICE Trans. J81-DII(2), 268–276 (1998). (in Japanese)
Google Scholar
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Joint factor analysis versus eigenchannels in speaker recognition. IEEE Trans. Audio Speech Lang. Process. 15(4), 1435–1447 (2007)
Article Google Scholar
Kenny, P., Boulianne, G., Ouellet, P., Dumouchel, P.: Speaker and session variability in GMM-based speaker verification. IEEE Trans. Audio Speech Lang. Process. 15(4), 1448–1460 (2007)
Article Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)
Article Google Scholar
Kenny, P.: Bayesian speaker verification with heavy-tailed priors. In: Proceedings of Odyssey (2010)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Makinae, H., Osanai, T., Kamada, T., Tanimoto, M.: Construction and preliminary analysis of a large-scale bone-conducted speech database. IEICE Techn. Rep. Speech 107(165), 97–102 (2007). (in Japanese)
Google Scholar
Furui, S., Maekawa, K., Isahara, H.: A Japanese national project on spontaneous speech corpus and processing technology. In: Proceedings of ASR 2000, pp. 244–248 (2000)
Google Scholar
Partridge, M., Calvo, R.A.: Fast dimensionality reduction and simple PCA. Intell. Data Anal. 2, 203–214 (1998)
Article Google Scholar
Tsuge, S., Kuroiwa, S.: AWA long-term recording speech corpus (AWA-LTR). In: Proceedings of 2013 International Workshop on Nonlinear Circuits, Communication and Signal Processing (NCSP 2013), pp. 17–20 (2013)
Google Scholar
Garcia-Romero, D., Espy-Wilson, C.Y.: Analysis of i-vector length normalization in speaker recognition systems. In: Proceedings of Interspeech, pp. 249–252 (2011)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., Silovsky, J., Stemmer, G., Vesely, K.: The Kaldi speech recognition toolkit. In: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (2011)
Google Scholar
scikit-learn, machine learning in Python. http://scikit-learn.org/stable/

Download references

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP16K00229.

Author information

Authors and Affiliations

Daido University, Nagoya, Aichi, 457-8530, Japan
Satoru Tsuge
Chiba University, Chiba, Chiba, 263-8522, Japan
Shingo Kuroiwa

Authors

Satoru Tsuge
View author publications
You can also search for this author in PubMed Google Scholar
Shingo Kuroiwa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Satoru Tsuge .

Editor information

Editors and Affiliations

Istituto Di Calcolo E Reti Ad Alte Prestazioni (Icar), National Research Council, Roma, Italy
Giuseppe De Pietro
Istituto Di Calcolo E Reti Ad Alte Prestazioni (Icar), National Research Council, Roma, Italy
Luigi Gallo
Bournemouth University, Poole, United Kingdom
Robert J. Howlett
Centre for Artificial Intelligence, Faculty of Engineering and Information Technology, University of Technology Sydney, Sydney, New South Wales, Australia
Lakhmi C. Jain
Griffith Sciences - Centres and Institutes, Griffith University, South Brisbane, Queensland, Australia
Ljubo Vlacic

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tsuge, S., Kuroiwa, S. (2019). Speaker Recognition in Orthogonal Complement of Time Session Variability Subspace. In: De Pietro, G., Gallo, L., Howlett, R., Jain, L., Vlacic, L. (eds) Intelligent Interactive Multimedia Systems and Services. KES-IIMSS-18 2018. Smart Innovation, Systems and Technologies, vol 98. Springer, Cham. https://doi.org/10.1007/978-3-319-92231-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-92231-7_11
Published: 12 June 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92230-0
Online ISBN: 978-3-319-92231-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics