Dynamic Assignment of Gaussian Components in Modelling Speech Spectra

Zolfaghari, Parham; Kato, Hiroko; Minami, Yasuhiro; Nakamura, Atsushi; Katagiri, Shigeru; Patterson, Roy

doi:10.1007/s11265-006-9768-3

Parham Zolfaghari¹,
Hiroko Kato¹,
Yasuhiro Minami¹,
Atsushi Nakamura¹,
Shigeru Katagiri¹ &
…
Roy Patterson²

81 Accesses
5 Citations
Explore all metrics

Abstract

In this paper, we describe a parametric mixture model for modelling the resonant characteristics of the vocal tract where Gaussian distributions are used to model spectral frequency regions. A mixtures of Gaussian (MoG) based parametrisation scheme is used for modelling a smoothed representation of the spectra. This smoothing procedure removes all signal periodicity from the spectra allowing highly natural analysis, manipulation and synthesis of speech. The goal of this parametrisation scheme is to ease the correspondence between the resonant characteristics of the vocal tract and the parametric distributions and modelling the spectrum with an appropriate number of parameters. Previously, a maximum likelihood (ML) approach to this parametrisation scheme was introduced. However, this approach has inherent local optima problems. Noting that, a relatively small class of Gaussian densities can approximate a large class of distributions, we propose a new scheme whereby starting with a large number of distributions in the mixture, we systematically reduce their number and re-approximate the densities in the mixture based on a distance criterion. The Kullback-Leibler (KL) distance was found to allow optimal MoG solutions to the spectra. Furthermore, a fitness measure based on KL information is used to provide a figure for estimating the model order in representing formant-like features. The proposed model is subjectively evaluated and is shown to reduce the number of Gaussian with an appreciable loss in the quality of the re-synthesised speech.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Speech signal modeling using multivariate distributions

Article Open access 30 December 2015

Towards Physically Interpretable Parametric Voice Conversion Functions

A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition

References

J. N. Holmes, W. J. Holmes, and P. N. Garner, “Using Formant Frequencies in Speech Recognition,” in Proceedings of the European Conference on Speech Communication and Technology, ISCA, Rhodes, Greece, vol. 4, 1997, pp. 2083–2086.
H. Kawahara, “Speech Representation and Transformation Using Adaptive Interpolation of Weighted Spectrum: Vocoder Revisited,” in Proc. ICASSP, IEEE, Munich, vol. 2, 1997, pp. 1303–1306.
P. Zolfaghari and A. Robinson, “Formant Analysis Using Mixtures of Gaussians,” in Proceedings of the International Conference on Spoken Language Processing, ISCA, Philadelphia, USA, vol. 2, 1996, pp. 1229–1232.
P. Zolfaghari, “Sinusoidal Model Based Segmental Speech Coding,” Ph.D. thesis, Cambridge University, 1998.
A. Dempster, N. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” Journal of the Royal Statistical Society Series B, vol. 39, 1977, pp. 1–38.
MathSciNet Google Scholar
P. Zolfaghari, S. Watanabe, A. Nakamura, and S. Katagiri, “Bayesian Modelling of the Spectrum using Gaussian Mixtures,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Montreal, Canada, 2004.
P. Harrison and C. Stevens, “Bayesian Forecasting,” Journal of the Royal Statistical Society Series B, vol. 38, 1976, pp. 205–247.
MathSciNet Google Scholar
G. Kitagawa and W. Gersch, in Smoothness Priors Analysis of Time Series, Lectures Notes in Statistics, vol. 116, Springer, Berlin Heidelberg New York, 1996.
Google Scholar
D. Titterington, A. Smith, and U. Makov, Statistical Analysis of Finite Mixtures, Wiley, New York, USA, 1985.
Google Scholar
W. Penny, “Kl-divergences of Normal, Gamma, Dirichlet and Wishart densities,” Technical report, Wellcome Department of Cognitive Neurology, University College London.

Download references

Author information

Authors and Affiliations

Speech Open Lab, NTT Communication Science Labs, NTT Corporation, Kyoto, Japan
Parham Zolfaghari, Hiroko Kato, Yasuhiro Minami, Atsushi Nakamura & Shigeru Katagiri
Centre for the Neural Basis of Hearing, Department of Physiology, University of Cambridge, Downing Street, Cambridge, UK
Roy Patterson

Authors

Parham Zolfaghari
View author publications
You can also search for this author in PubMed Google Scholar
Hiroko Kato
View author publications
You can also search for this author in PubMed Google Scholar
Yasuhiro Minami
View author publications
You can also search for this author in PubMed Google Scholar
Atsushi Nakamura
View author publications
You can also search for this author in PubMed Google Scholar
Shigeru Katagiri
View author publications
You can also search for this author in PubMed Google Scholar
Roy Patterson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Parham Zolfaghari.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zolfaghari, P., Kato, H., Minami, Y. et al. Dynamic Assignment of Gaussian Components in Modelling Speech Spectra. J VLSI Sign Process Syst Sign Image Video Technol 45, 7–19 (2006). https://doi.org/10.1007/s11265-006-9768-3

Download citation

Published: 14 December 2006
Issue Date: November 2006
DOI: https://doi.org/10.1007/s11265-006-9768-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Assignment of Gaussian Components in Modelling Speech Spectra

Abstract

Access this article

Similar content being viewed by others

Speech signal modeling using multivariate distributions

Towards Physically Interpretable Parametric Voice Conversion Functions

A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dynamic Assignment of Gaussian Components in Modelling Speech Spectra

Abstract

Access this article

Similar content being viewed by others

Speech signal modeling using multivariate distributions

Towards Physically Interpretable Parametric Voice Conversion Functions

A Comparative Analysis of Bayesian Nonparametric Inference Algorithms for Acoustic Modeling in Speech Recognition

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation