Skip to main content
Log in

Eigenvoices: A compact representation of speakers in model space

Voix propres : vers une représentation compacte des locuteurs dans l’espace des modè,les

  • Published:
Annales Des Télécommunications Aims and scope Submit manuscript

Abstract

In this article, we present a new approach to modeling speaker-dependent systems. The approach was inspired by the eigenfaces techniques used in face recognition. We build a linear vector space of low dimensionality, called eigenspace, in which speakers are located. The basis vectors of this space are called eigenvoices. Each eigenvoice models a direction of inter-speaker variability. The eigenspace is built during the training phase. Then, any speaker model can be expressed as a linear combination of eigenvoices. The benefits of this technique as set forth in this article reside in the reduction of the number of parameters that describe a model. Thereby we are able to reduce the number of parameters to estimate, as well as computation and/or storage costs. We apply the approach to speaker adaptation and speaker recognition. Some experimental results are supplied.

Résumé

Cet article présente une nouvelle approche inspirée de la reconnaissance d’images, adaptée et appliquée à la parole. Un espace vectoriel de dimension réduite, appelé espace propre (eigenspace), dans lequel les locuteurs se trouvent confinés est construit. Les vecteurs de base de cet espace sont appelés voix propres (eigenvoices). Chaque voix propre modélise une composante de variabilité inter-locuteur. L’espace propre est construit lors de la phase d’apprentissage classique pour des systèmes liés à la parole. Un modèle du locuteur est par la suite associé à une combinaison linéaire des vecteurs de l’espace réduit des locuteurs. L’avantage de cette méthode, mis en avant dans l’article, est la réduction du nombre de paramètres caractéristiques d’un modèle. De ce fait, le nombre de paramètres à estimer est réduit, ainsi que le temps de calcul et/ou de stockage. Cette technique est ici appliquée à l’adaptation du locuteur pour un système de reconnaissance automatique du locuteur et à la reconnaissance automatique du locuteur. Quelques résultats expérimentaux sont présentés à cette occasion.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ahadi-Sarkani (S.), “Bayesian and predictive techniques for speaker adaptation”,Ph.D. Thesis, 1996, Cambridge University.

  2. Bimbot (F.), Magrin-Chagnolleau (I.), Mathan (L.), “Second-order statistical measures for text-independent speaker identification”,Speech Communication, 1995,17., pp. 177–192.

    Article  Google Scholar 

  3. Beigi (H.S.M.), Maes (S.H.), Sorensen (J.S.), “A distance measure between collections of distributions and its applications to speaker recognition”,Proceedings of theInternational Conference on Acoustics, Speech and Signal Processing (ICASSP), 1998,2, pp. 753–757.

    Google Scholar 

  4. Chen (S.), De Souza (P.), “Speaker adaptation by correlation (abc)”,Proceedings of Eurospeech, 1997, pp. 2111–21s14.

  5. Chou (W.), “Maximum a posterior linear regression with elliptically symmetric matrix variate priors”,Proceedings of Eurospeech, 1999, V. l,pp. 1–4.

    Google Scholar 

  6. Comon (P.), “Independent component analysis, a new concept.?”,Signal Processing, 1994,36, n°. 3, pp. 287–314.

    Article  MATH  Google Scholar 

  7. Dempster (A.P.), Laird (N.M.), Rubin (D.P.), “Maximum-likelihood from incomplete data via the em algorithm”,Journal of the Royal Statistical Society, 1977, Vol. B, pp. 1–38.

    MathSciNet  Google Scholar 

  8. Forsyth (M.), “Hidden Markov models for automatic speaker verification”,PhD thesis, University of Edinburgh, 1995.

  9. Fukunaga (K.), “Introduction to statistical pattern recognition”, 1972,Academic Press, New York and London.

    Google Scholar 

  10. Gauvain (J.-L.), Lee (C.-H.), “Bayesian learning for hidden Markov model with Gaussian mixture state observation densities”,Speech Communications, 1992,11, pp. 205–213.

    Article  Google Scholar 

  11. Gales (M.F.J.), “Transformation smoothing for speaker and environmental adaptation”,Proceedings of Eurospech, 1997, pp. 2067–2071.

  12. Gales (M.F.J.), “Cluster adaptive training for speech recognition”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1998,5, pp. 1783–1786.

    Google Scholar 

  13. Gales (M.F.J.), Woodland (P.), “Mean and variance adaptation within the mllr framework”,Computer Speech and Language, 1996,.10, n°. 4, pp. 250–264.

    Article  Google Scholar 

  14. Goronzy (S.), Kompe (R.), “A MAP-like weighting scheme for mllr speaker adaptation”,Proceedings of Eurospeech, 1999,1, pp. 5–8.

    Google Scholar 

  15. Hazen (T.), “The use of speaker correlation information for automatic speech recognition”,PhD Thesis, 1998, MIT.

  16. Hermansky (H.), “Perceptual linear predictive (plp) analysis of speech”,Journal of the American Society of Acoustics (JASA), 1990,87, n° 4, pp. 1738–1752.

    Article  Google Scholar 

  17. Jolliffe (LT.), “Principal component analysis”,Springer- Verlag, 1986.

  18. Kannan (A.), Ostendorf (M.) “Modeling dependency in adaptation of acoustic models using multiscale tree processes”,Proceedings of Eurospech, 1997, pp. 1863–1867.

  19. Kuhn (R.), Nguyen (P.), Junqua (J.-C), Boman (R.), Nledzielski (N.), FlNCKE (S.), Field (K.), Contolini (M.), “Fast speaker adaptation in eigenvoice space”,Proceedings of theInternational Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1999,2, pp 749–752.

    Google Scholar 

  20. Nguyen (P.), Wellekens (C), Junqua (J.-C), “Maximum-likelihood eigenspace and mllr for speech recognition in noisy environments”,Proceedings of Eurospeech, 1999,6, pp. 2519–2522.

    Google Scholar 

  21. Legetter (C. J.), Woodland (P. C), “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models“,Computer Speech and Language, 1995,9, pp. 171–185

    Article  Google Scholar 

  22. Olsen (J.), “Speaker recognition based on discriminative projection models”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1998,56 pp. 1919–1922.

    Google Scholar 

  23. Reynolds (D.A.), “Speaker identification and verification using Gaussian mixture speaker models”,Speech Communication,17, 1995, pp. 91–108

    Article  Google Scholar 

  24. Rosenberg (A.E.), Lee (C.-H.), Juang (B.-H.), Song (F.K.), “The use of cohort normalized scores for speaker verification”,Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 1992,56 pp. 262–272.

    Google Scholar 

  25. Rosenberg (A.E.), Lee (C.-H.), Song (F.K.), McGee (A.), “Experiments in automatic talker verification using sub-word unit hidden Markov models”,Proceedings of the International Conference on Speech and Language Processing (ICSLP), 1990,56 pp. 141–144

    Google Scholar 

  26. Rose (R.C.), Reynolds (D.A.), “Text-independent speaker identification using automatic acoustic segmentation”, Proceedings of theInternational Conference on Acoustics,Speech and Signal Processing (ICASSP), 1990, pp. 293– 296.

  27. Suzuki (M.), Abe (T.), Mori (H.), Marino (S.) and Aso (H.), “High-Speed speaker adaptation using phoneme-dependent tree-structured speaker clustering”, Proceedings of theInternational Conference on Speech and Language Processing (ICSLP), 1998, pp. 2299–2302.

  28. Turk (M.) andPentland (A.), “Eigenfaces for Recognition”,Journal of Cognitive Neuroscience, 1991, V.3, n° 1, pp. 71–86.

    Article  Google Scholar 

  29. Viikki (O.), Laurila (K.), “Incremental online speaker adaptation in adverse conditions”, Proceedings of theInternational Conference on Speech and Language Processing (ICSLP), 1998, V. 5, pp. 1779–1782.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua, Nancy Niedzielski or Christian Wellekens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nguyen, P., Kuhn, R., Junqua, JC. et al. Eigenvoices: A compact representation of speakers in model space. Ann. Télécommun. 55, 163–171 (2000). https://doi.org/10.1007/BF03001909

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03001909

Keywords

Mots clés

Navigation