Eigenvoices: A compact representation of speakers in model space

Nguyen, Patrick; Kuhn, Roland; Junqua, Jean-Claude; Niedzielski, Nancy; Wellekens, Christian

doi:10.1007/BF03001909

Eigenvoices: A compact representation of speakers in model space

Voix propres : vers une représentation compacte des locuteurs dans l’espace des modè,les

Published: March 2000

Volume 55, pages 163–171, (2000)
Cite this article

Annales Des Télécommunications Aims and scope Submit manuscript

Patrick Nguyen^1,2,
Roland Kuhn¹,
Jean-Claude Junqua¹,
Nancy Niedzielski¹ &
…
Christian Wellekens²

157 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

In this article, we present a new approach to modeling speaker-dependent systems. The approach was inspired by the eigenfaces techniques used in face recognition. We build a linear vector space of low dimensionality, called eigenspace, in which speakers are located. The basis vectors of this space are called eigenvoices. Each eigenvoice models a direction of inter-speaker variability. The eigenspace is built during the training phase. Then, any speaker model can be expressed as a linear combination of eigenvoices. The benefits of this technique as set forth in this article reside in the reduction of the number of parameters that describe a model. Thereby we are able to reduce the number of parameters to estimate, as well as computation and/or storage costs. We apply the approach to speaker adaptation and speaker recognition. Some experimental results are supplied.

Résumé

Cet article présente une nouvelle approche inspirée de la reconnaissance d’images, adaptée et appliquée à la parole. Un espace vectoriel de dimension réduite, appelé espace propre (eigenspace), dans lequel les locuteurs se trouvent confinés est construit. Les vecteurs de base de cet espace sont appelés voix propres (eigenvoices). Chaque voix propre modélise une composante de variabilité inter-locuteur. L’espace propre est construit lors de la phase d’apprentissage classique pour des systèmes liés à la parole. Un modèle du locuteur est par la suite associé à une combinaison linéaire des vecteurs de l’espace réduit des locuteurs. L’avantage de cette méthode, mis en avant dans l’article, est la réduction du nombre de paramètres caractéristiques d’un modèle. De ce fait, le nombre de paramètres à estimer est réduit, ainsi que le temps de calcul et/ou de stockage. Cette technique est ici appliquée à l’adaptation du locuteur pour un système de reconnaissance automatique du locuteur et à la reconnaissance automatique du locuteur. Quelques résultats expérimentaux sont présentés à cette occasion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Ahadi-Sarkani (S.), “Bayesian and predictive techniques for speaker adaptation”,Ph.D. Thesis, 1996, Cambridge University.
Bimbot (F.), Magrin-Chagnolleau (I.), Mathan (L.), “Second-order statistical measures for text-independent speaker identification”,Speech Communication, 1995,17., pp. 177–192.
Article Google Scholar
Beigi (H.S.M.), Maes (S.H.), Sorensen (J.S.), “A distance measure between collections of distributions and its applications to speaker recognition”,Proceedings of theInternational Conference on Acoustics, Speech and Signal Processing (ICASSP), 1998,2, pp. 753–757.
Google Scholar
Chen (S.), De Souza (P.), “Speaker adaptation by correlation (abc)”,Proceedings of Eurospeech, 1997, pp. 2111–21s14.
Chou (W.), “Maximum a posterior linear regression with elliptically symmetric matrix variate priors”,Proceedings of Eurospeech, 1999, V. l,pp. 1–4.
Google Scholar
Comon (P.), “Independent component analysis, a new concept.?”,Signal Processing, 1994,36, n°. 3, pp. 287–314.
Article MATH Google Scholar
Dempster (A.P.), Laird (N.M.), Rubin (D.P.), “Maximum-likelihood from incomplete data via the em algorithm”,Journal of the Royal Statistical Society, 1977, Vol. B, pp. 1–38.
MathSciNet Google Scholar
Forsyth (M.), “Hidden Markov models for automatic speaker verification”,PhD thesis, University of Edinburgh, 1995.
Fukunaga (K.), “Introduction to statistical pattern recognition”, 1972,Academic Press, New York and London.
Google Scholar
Gauvain (J.-L.), Lee (C.-H.), “Bayesian learning for hidden Markov model with Gaussian mixture state observation densities”,Speech Communications, 1992,11, pp. 205–213.
Article Google Scholar
Gales (M.F.J.), “Transformation smoothing for speaker and environmental adaptation”,Proceedings of Eurospech, 1997, pp. 2067–2071.
Gales (M.F.J.), “Cluster adaptive training for speech recognition”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1998,5, pp. 1783–1786.
Google Scholar
Gales (M.F.J.), Woodland (P.), “Mean and variance adaptation within the mllr framework”,Computer Speech and Language, 1996,.10, n°. 4, pp. 250–264.
Article Google Scholar
Goronzy (S.), Kompe (R.), “A MAP-like weighting scheme for mllr speaker adaptation”,Proceedings of Eurospeech, 1999,1, pp. 5–8.
Google Scholar
Hazen (T.), “The use of speaker correlation information for automatic speech recognition”,PhD Thesis, 1998, MIT.
Hermansky (H.), “Perceptual linear predictive (plp) analysis of speech”,Journal of the American Society of Acoustics (JASA), 1990,87, n° 4, pp. 1738–1752.
Article Google Scholar
Jolliffe (LT.), “Principal component analysis”,Springer- Verlag, 1986.
Kannan (A.), Ostendorf (M.) “Modeling dependency in adaptation of acoustic models using multiscale tree processes”,Proceedings of Eurospech, 1997, pp. 1863–1867.
Kuhn (R.), Nguyen (P.), Junqua (J.-C), Boman (R.), Nledzielski (N.), FlNCKE (S.), Field (K.), Contolini (M.), “Fast speaker adaptation in eigenvoice space”,Proceedings of theInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999,2, pp 749–752.
Google Scholar
Nguyen (P.), Wellekens (C), Junqua (J.-C), “Maximum-likelihood eigenspace and mllr for speech recognition in noisy environments”,Proceedings of Eurospeech, 1999,6, pp. 2519–2522.
Google Scholar
Legetter (C. J.), Woodland (P. C), “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models“,Computer Speech and Language, 1995,9, pp. 171–185
Article Google Scholar
Olsen (J.), “Speaker recognition based on discriminative projection models”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1998,56 pp. 1919–1922.
Google Scholar
Reynolds (D.A.), “Speaker identification and verification using Gaussian mixture speaker models”,Speech Communication,17, 1995, pp. 91–108
Article Google Scholar
Rosenberg (A.E.), Lee (C.-H.), Juang (B.-H.), Song (F.K.), “The use of cohort normalized scores for speaker verification”,Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992,56 pp. 262–272.
Google Scholar
Rosenberg (A.E.), Lee (C.-H.), Song (F.K.), McGee (A.), “Experiments in automatic talker verification using sub-word unit hidden Markov models”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1990,56 pp. 141–144
Google Scholar
Rose (R.C.), Reynolds (D.A.), “Text-independent speaker identification using automatic acoustic segmentation”, Proceedings of theInternational Conference on Acoustics,Speech and Signal Processing (ICASSP), 1990, pp. 293– 296.
Suzuki (M.), Abe (T.), Mori (H.), Marino (S.) and Aso (H.), “High-Speed speaker adaptation using phoneme-dependent tree-structured speaker clustering”, Proceedings of theInternational Conference on Speech and Language Processing (ICSLP), 1998, pp. 2299–2302.
Turk (M.) andPentland (A.), “Eigenfaces for Recognition”,Journal of Cognitive Neuroscience, 1991, V.3, n° 1, pp. 71–86.
Article Google Scholar
Viikki (O.), Laurila (K.), “Incremental online speaker adaptation in adverse conditions”, Proceedings of theInternational Conference on Speech and Language Processing (ICSLP), 1998, V. 5, pp. 1779–1782.
Google Scholar

Download references

Author information

Authors and Affiliations

Speech Technology Laboratory, Panasonic Technologies, Suite #202, 3888, State Street, 93105, Santa Barbara, CA, USA
Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua & Nancy Niedzielski
Institut Eurecom, Route des Crêtes, BP 193-2229, F-06904, Sophia Antipolis Cedex, France
Patrick Nguyen & Christian Wellekens

Authors

Patrick Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Roland Kuhn
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Claude Junqua
View author publications
You can also search for this author in PubMed Google Scholar
Nancy Niedzielski
View author publications
You can also search for this author in PubMed Google Scholar
Christian Wellekens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua, Nancy Niedzielski or Christian Wellekens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, P., Kuhn, R., Junqua, JC. et al. Eigenvoices: A compact representation of speakers in model space. Ann. Télécommun. 55, 163–171 (2000). https://doi.org/10.1007/BF03001909

Download citation

Received: 15 October 1999
Accepted: 02 February 2000
Issue Date: March 2000
DOI: https://doi.org/10.1007/BF03001909

Keywords

Mots clés

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Eigenvoices: A compact representation of speakers in model space

Abstract

Résumé

Access this article

Similar content being viewed by others

On Behaviour of PLDA Models in the Task of Speaker Recognition

Efficient speaker identification using spectral entropy

Modelling Speaker Variability Using Covariance Learning

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Keywords

Mots clés

Navigation

Eigenvoices: A compact representation of speakers in model space

Abstract

Résumé

Access this article

Similar content being viewed by others

On Behaviour of PLDA Models in the Task of Speaker Recognition

Efficient speaker identification using spectral entropy

Modelling Speaker Variability Using Covariance Learning

References

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mots clés

Search

Navigation