Skip to main content
Log in

Parametric Formant Modelling and Transformation in Voice Conversion

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

This paper presents a method for the estimation and mapping of parametric models of speech resonance at formants for voice conversion. The spectral features at formants that contribute to voice characteristics are the trajectories of the frequencies, the bandwidths and intensities of the resonance at formants. The formant features are extracted from the poles of a linear prediction (LP) model of speech. The statistical distributions of formants are modelled by a two-dimensional hidden Markov model (HMM) spanning the time and frequency dimensions. Experimental results are presented which show a close match between HMM-based formant models and the histograms of formants. For voice conversion two alternative methods are explored for mapping the formants of a source speaker to those of a target speaker. The first method is based on an adaptive formant-tracking warping of the frequency response of the LP model and the second method is based on the rotation of the poles of the LP model of speech. Both methods transform all spectral parameters of the resonance at formants of the source speaker towards those of the target speaker. In addition, the issues affecting the selection of the warping ratios for the mapping functions are investigated. Experimental results of formant estimation and perceptual evaluation of voice morphing based on parametric formant models are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abe, M., Nakamura, S., Shikano, K., and Kuwabara, H. (1988). Voice conversion through vector quantization, In Proceedings of ICASSP 1998, pp. 565–568.

  • Acero, A. (1999). Formant analysis and synthesis using hidden markov models, In Proc. of the Eurospeech Conference, Volume 3, Page 1047–1050.

  • Allen, J. Hunnicutt, S. Klatt, D. (1987). From Text to Speech: The MITalk System. Cambridge, Cambridge University Press.

    Google Scholar 

  • Arslan L.M. and Talkin, D. (1997). Voice Conversion by codebook mapping of line spectral frequencies and excitation spectrum, EUROSPEECH 1997 Proceedings.

  • Bazzi, I., Acero, A., and Deng, Li. (2003). An expectation maximazation approach for Formant Tracking Using a Parameter-free Non-Linear Predictor. In Proc. ICASSP 2003, pp. 464–467.

  • Cahn, J.E. (1990). The generation of affect in synthesized speech, Journal of the American Voice I/O Society, 8(July): 1–19.

    Google Scholar 

  • Carlson, R., Granstrom, B., and Karlsson, I. (1991). Experiments with voice modelling in speech synthesis. Speech Communication, 10: 481–489.

    Article  Google Scholar 

  • Carlson, R., Sigvardson, T. and Arvid, Sjolander. (2002). Data-driven formant synthesis, TMH-QPSR Vol.44 – Fonetik 2002.

  • Chen, Y., Chu, M., Chang, E., Liu, J., and Liu, R. (2003). Voice conversion with smoothed gmm and map adaptation, In Proc. Eurospeech 2003, pp. 2413–2416.

  • De Boor, C. (1978). A Practical Guide to Splines, Springer-Verlag.

  • Edrington, M. Lowry, A. Jackson, P. Breen, A. Minnis, S. (1998), Overview of Current Text-to-Speech Techniques: Part II - Prosody and Speech Generation, in Speech Technology for Telecommunications, Chapman & Hall, London, UK.

  • Fant G. (1986), Glottal flow: Models and interaction, Journal of Phonetics, 14: 393–399.

    Google Scholar 

  • Furui, S. (1989). Digital Speech Processing, Synthesis, and Recognition, Marcel Dekker, New York.

    Google Scholar 

  • Ho, C.H., Rentzos, D. Vaseghi, and S. (2002). Formant model estimation and transformation for voice morphing. In Proc. ICSLP, pp. 2149–2152.

  • Holmes, J. Holmes, W. and Garner, P. (1997). Using formant frequencies in speech recognition. In Proc. Eurospeech-97, vol. 4, pp. 2083–2086.

  • Horne, M. (ed). (2000), Prosody: Theory and Experiment. Studies Presented to Gösta Bruce. Kluwer Academic Publishers, Dordrecht.

  • Iwahashi N. and Sagisaka, Y. (1994). Speech Spectrum transformation by speaker interpolation, In Proceedings IEEE Int. Conference Acoustics, Speech Signal Processing.

  • Kain, A and Macon, M.W. (1998). Spectral voice conversion for text-to-speech synthesis. Proceedings of ICASSP, vol. 1, pp. 285–288.

  • Kopec, D.H. (1986). Formant tracking using hidden Markov models and vector quantisation. IEEE Trans on Acoust., Speech, Signal Processing, Vol. ASSP-34, No 4, pp. 709–729.

  • Kuwabara, H. and Sagisaka, Y. (1995). Acoustic characteristics of speaker individuality: Control and Conversion. 16: 165–173, Feb.

    Google Scholar 

  • Lee, M. van Santen, J. Mobius, B. Olive, J. (1999). Formant tracking using segmental phonemic information” In Proceedings of the Eurospeech 1999, vol. 6, 2789–2792.

  • McAulay, R.J. and Quatieri, T.F. (1995). Sinusoidal coding, in speech coding and synthesis. In W.B. Kleijn and K.K. Paliwal, (Eds.) Elsevier Science, Hol, 4, pp. 121–173.

  • Moulines, E. and Charpentier, F. (1990). Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Communication, 9: 453–467.

    Article  Google Scholar 

  • Rao, A. and Kumaresan, R. (2000), On decomposing speech into modulated components. IEEE Trans. Speech and Audio Proc. 8(3): 240–254.

    Google Scholar 

  • Rabiner L, Juang BH. (1993). Fundamentals of speech recognition, Prentice Hall, Englewood Cliffs.

  • Slaney, M., Covell, M., and Lassiter, B. (1996). Automatic audio morphing, In Proceedings of the 1996 ICASSP, Vol. 2 pp. 1001–1004.

  • Styger, T and Keller E. (1994). Formant synthesis. In E. Keller (Ed.), Fundamentals in Speech Synthesis and Speech Recognition, pp. 109–128. Wiley.

  • Stylianou, Y., Cappe, O., and Moulines, E. (1998). Continuous Probabilistic Transform for Voice Conversion, IEEE transactions on speech & audio processing, Vol.6, No.2, pp. 131–142.

  • Tang, M., C. Wang, and S. Seneff, (2001). Voice transformations: from speech synthesis to mammalian vocalizations. In Proceedings of the 7th European Conference on Speech Communication and Technology, Denmark 2001.

  • Turk, O. and Arslan, L.M. (2002). Subband based voice conversion, In Proceedings of the 2002 International Conference on Spoken Language Processing, pp. 289–292.

  • Valbret H., Moulines, E. and Tubach, J.P. (1992). Voice transformation using PSOLA techniques, Speech Communication, vol. 11, pp. 175–187.

  • Weber K., Ikbal S., Bengio S., and Bourlard H., (2003). Robust speech recognition and feature extraction using HMM2, Computer Speech and Language 17, pp. 195–211.

  • Woodland, P.C. and Young, S.J. (1993). The HTK Continuous Speech Recogniser. Proceedings Eurospeech 1993, pp. 2207–2219.

  • Xia, K. and Espy-Wilson, C. (2000). A new strategy of formant tracking based on dynamic programming. Intern. Conf. on Spoken Language Processing, Oct. 2000, pp. III 55–58.

  • Yan, Q., Vaseghi, S., Ho, C.H., Rentzos, D., Turajlic, E. (2003). Comparative analysis and synthesis of formant trajectories of british and broad australian accents. Proceedings of Eurospeech 2003, pp. 2941–2944.

  • Yegnanarayana, B. and Veldhuis R.N.J.(1998). Extraction of vocal-tract system characteristics from speech signal. IEEE Trans. On Speech and Audio Processing, vol. 6, pp. 313–327.

  • Zhan P. & Westphal, M. (1997). Speaker normalisation based on frequency warping in proceedings of ICASSP 1997, pp. 1039–1042.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dimitrios Rentzos.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rentzos, D., Vaseghi, S., Yan, Q. et al. Parametric Formant Modelling and Transformation in Voice Conversion. Int J Speech Technol 8, 227–245 (2005). https://doi.org/10.1007/s10772-006-5692-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-006-5692-y

Keywords

Navigation