Parametric Formant Modelling and Transformation in Voice Conversion

Rentzos, Dimitrios; Vaseghi, Saeed; Yan, Qin; Ho, Ching-Hsiang

doi:10.1007/s10772-006-5692-y

Parametric Formant Modelling and Transformation in Voice Conversion

Published: 02 June 2006

Volume 8, pages 227–245, (2005)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Dimitrios Rentzos¹,
Saeed Vaseghi¹,
Qin Yan¹ &
…
Ching-Hsiang Ho²

131 Accesses
6 Citations
Explore all metrics

Abstract

This paper presents a method for the estimation and mapping of parametric models of speech resonance at formants for voice conversion. The spectral features at formants that contribute to voice characteristics are the trajectories of the frequencies, the bandwidths and intensities of the resonance at formants. The formant features are extracted from the poles of a linear prediction (LP) model of speech. The statistical distributions of formants are modelled by a two-dimensional hidden Markov model (HMM) spanning the time and frequency dimensions. Experimental results are presented which show a close match between HMM-based formant models and the histograms of formants. For voice conversion two alternative methods are explored for mapping the formants of a source speaker to those of a target speaker. The first method is based on an adaptive formant-tracking warping of the frequency response of the LP model and the second method is based on the rotation of the poles of the LP model of speech. Both methods transform all spectral parameters of the resonance at formants of the source speaker towards those of the target speaker. In addition, the issues affecting the selection of the warping ratios for the mapping functions are investigated. Experimental results of formant estimation and perceptual evaluation of voice morphing based on parametric formant models are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Abe, M., Nakamura, S., Shikano, K., and Kuwabara, H. (1988). Voice conversion through vector quantization, In Proceedings of ICASSP 1998, pp. 565–568.
Acero, A. (1999). Formant analysis and synthesis using hidden markov models, In Proc. of the Eurospeech Conference, Volume 3, Page 1047–1050.
Allen, J. Hunnicutt, S. Klatt, D. (1987). From Text to Speech: The MITalk System. Cambridge, Cambridge University Press.
Google Scholar
Arslan L.M. and Talkin, D. (1997). Voice Conversion by codebook mapping of line spectral frequencies and excitation spectrum, EUROSPEECH 1997 Proceedings.
Bazzi, I., Acero, A., and Deng, Li. (2003). An expectation maximazation approach for Formant Tracking Using a Parameter-free Non-Linear Predictor. In Proc. ICASSP 2003, pp. 464–467.
Cahn, J.E. (1990). The generation of affect in synthesized speech, Journal of the American Voice I/O Society, 8(July): 1–19.
Google Scholar
Carlson, R., Granstrom, B., and Karlsson, I. (1991). Experiments with voice modelling in speech synthesis. Speech Communication, 10: 481–489.
Article Google Scholar
Carlson, R., Sigvardson, T. and Arvid, Sjolander. (2002). Data-driven formant synthesis, TMH-QPSR Vol.44 – Fonetik 2002.
Chen, Y., Chu, M., Chang, E., Liu, J., and Liu, R. (2003). Voice conversion with smoothed gmm and map adaptation, In Proc. Eurospeech 2003, pp. 2413–2416.
De Boor, C. (1978). A Practical Guide to Splines, Springer-Verlag.
Edrington, M. Lowry, A. Jackson, P. Breen, A. Minnis, S. (1998), Overview of Current Text-to-Speech Techniques: Part II - Prosody and Speech Generation, in Speech Technology for Telecommunications, Chapman & Hall, London, UK.
Fant G. (1986), Glottal flow: Models and interaction, Journal of Phonetics, 14: 393–399.
Google Scholar
Furui, S. (1989). Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker, New York.
Google Scholar
Ho, C.H., Rentzos, D. Vaseghi, and S. (2002). Formant model estimation and transformation for voice morphing. In Proc. ICSLP, pp. 2149–2152.
Holmes, J. Holmes, W. and Garner, P. (1997). Using formant frequencies in speech recognition. In Proc. Eurospeech-97, vol. 4, pp. 2083–2086.
Horne, M. (ed). (2000), Prosody: Theory and Experiment. Studies Presented to Gösta Bruce. Kluwer Academic Publishers, Dordrecht.
Iwahashi N. and Sagisaka, Y. (1994). Speech Spectrum transformation by speaker interpolation, In Proceedings IEEE Int. Conference Acoustics, Speech Signal Processing.
Kain, A and Macon, M.W. (1998). Spectral voice conversion for text-to-speech synthesis. Proceedings of ICASSP, vol. 1, pp. 285–288.
Kopec, D.H. (1986). Formant tracking using hidden Markov models and vector quantisation. IEEE Trans on Acoust., Speech, Signal Processing, Vol. ASSP-34, No 4, pp. 709–729.
Kuwabara, H. and Sagisaka, Y. (1995). Acoustic characteristics of speaker individuality: Control and Conversion. 16: 165–173, Feb.
Google Scholar
Lee, M. van Santen, J. Mobius, B. Olive, J. (1999). Formant tracking using segmental phonemic information” In Proceedings of the Eurospeech 1999, vol. 6, 2789–2792.
McAulay, R.J. and Quatieri, T.F. (1995). Sinusoidal coding, in speech coding and synthesis. In W.B. Kleijn and K.K. Paliwal, (Eds.) Elsevier Science, Hol, 4, pp. 121–173.
Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, 9: 453–467.
Article Google Scholar
Rao, A. and Kumaresan, R. (2000), On decomposing speech into modulated components. IEEE Trans. Speech and Audio Proc. 8(3): 240–254.
Google Scholar
Rabiner L, Juang BH. (1993). Fundamentals of speech recognition, Prentice Hall, Englewood Cliffs.
Slaney, M., Covell, M., and Lassiter, B. (1996). Automatic audio morphing, In Proceedings of the 1996 ICASSP, Vol. 2 pp. 1001–1004.
Styger, T and Keller E. (1994). Formant synthesis. In E. Keller (Ed.), Fundamentals in Speech Synthesis and Speech Recognition, pp. 109–128. Wiley.
Stylianou, Y., Cappe, O., and Moulines, E. (1998). Continuous Probabilistic Transform for Voice Conversion, IEEE transactions on speech & audio processing, Vol.6, No.2, pp. 131–142.
Tang, M., C. Wang, and S. Seneff, (2001). Voice transformations: from speech synthesis to mammalian vocalizations. In Proceedings of the 7th European Conference on Speech Communication and Technology, Denmark 2001.
Turk, O. and Arslan, L.M. (2002). Subband based voice conversion, In Proceedings of the 2002 International Conference on Spoken Language Processing, pp. 289–292.
Valbret H., Moulines, E. and Tubach, J.P. (1992). Voice transformation using PSOLA techniques, Speech Communication, vol. 11, pp. 175–187.
Weber K., Ikbal S., Bengio S., and Bourlard H., (2003). Robust speech recognition and feature extraction using HMM2, Computer Speech and Language 17, pp. 195–211.
Woodland, P.C. and Young, S.J. (1993). The HTK Continuous Speech Recogniser. Proceedings Eurospeech 1993, pp. 2207–2219.
Xia, K. and Espy-Wilson, C. (2000). A new strategy of formant tracking based on dynamic programming. Intern. Conf. on Spoken Language Processing, Oct. 2000, pp. III 55–58.
Yan, Q., Vaseghi, S., Ho, C.H., Rentzos, D., Turajlic, E. (2003). Comparative analysis and synthesis of formant trajectories of british and broad australian accents. Proceedings of Eurospeech 2003, pp. 2941–2944.
Yegnanarayana, B. and Veldhuis R.N.J.(1998). Extraction of vocal-tract system characteristics from speech signal. IEEE Trans. On Speech and Audio Processing, vol. 6, pp. 313–327.
Zhan P. & Westphal, M. (1997). Speaker normalisation based on frequency warping in proceedings of ICASSP 1997, pp. 1039–1042.

Download references

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, Brunel University, London
Dimitrios Rentzos, Saeed Vaseghi & Qin Yan
Fortune Institute of Technology, Kaohsiung, Taiwan, 842, R.O.C.
Ching-Hsiang Ho

Authors

Dimitrios Rentzos
View author publications
You can also search for this author in PubMed Google Scholar
Saeed Vaseghi
View author publications
You can also search for this author in PubMed Google Scholar
Qin Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Hsiang Ho
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dimitrios Rentzos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rentzos, D., Vaseghi, S., Yan, Q. et al. Parametric Formant Modelling and Transformation in Voice Conversion. Int J Speech Technol 8, 227–245 (2005). https://doi.org/10.1007/s10772-006-5692-y

Download citation

Published: 02 June 2006
Issue Date: September 2005
DOI: https://doi.org/10.1007/s10772-006-5692-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parametric Formant Modelling and Transformation in Voice Conversion

Abstract

Access this article

Similar content being viewed by others

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

Towards Physically Interpretable Parametric Voice Conversion Functions

Method of Extracting Formant Frequencies Based on a Vocal Signal

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parametric Formant Modelling and Transformation in Voice Conversion

Abstract

Access this article

Similar content being viewed by others

A Voice Morphing Model Based on the Gaussian Mixture Model and Generative Topographic Mapping

Towards Physically Interpretable Parametric Voice Conversion Functions

Method of Extracting Formant Frequencies Based on a Vocal Signal

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation