Skip to main content

Pitch Synchronous Transform Warping in Voice Conversion

  • Conference paper
Cognitive Behavioural Systems

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7403))

Abstract

In this paper a new voice conversion algorithm is presented, which transforms the utterance of a source speaker into the utterance of a target speaker. The voice conversion approach is based on pitch synchronous speech analysis, Discrete Cosine Transform (DCT), nonlinear spectral warping with spectrum interpolation and pitch synchronous speech synthesis with overlapping using the speech production model. The DCT speech model contains also information about the phase properties of the modeled speech frame, but is, in contrary to a model based e.g. on the discrete Fourier transform, a real model and can be efficiently used for speech coding and voice conversion. The resulting finite impulse response of the converted DCT speech model is obtained by the inverse DCT and it is of the mixed phase type. The proposed voice conversion procedure results in speech with high naturalness.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Moulines, E., Sagisaka, Y.(eds.): Voice Conversion: State of the Art and Perspectives. Special Issue of Speech Communication 16(2) (1995)

    Google Scholar 

  2. Kain, A.B.: High Resolution Voice Transformation. PhD Thesis, Oregon Graduate Institute of Science and Technology (2001)

    Google Scholar 

  3. PÅ™ibilovĂ¡, A., PÅ™ibil, J.: Non-linear Frequency Scale Mapping for Voice Conversion in Text-To-Speech System with Cepstral Description. Speech Communication 48(12), 1691–1703 (2006)

    Article  Google Scholar 

  4. Vondra, M.: Voice Transformation in Vocoders and TTS Systems. PhD Dissertation, Brno University of Technology (2005) (in Czech)

    Google Scholar 

  5. Nemsak, S.: Pitch Shifting and Voice Transformation Using PSOLA. In: Vich, R. (ed.) Proc. of the 13th Czech-German Workshop on Speech Processing, Prague, September 15-17, pp. 38–41 (2003)

    Google Scholar 

  6. Vondra, M., VĂ­ch, R.: Speech Identity Conversion. In: Chollet, G., Esposito, A., FaĂºndez-Zanuy, M., Marinaro, M. (eds.) Nonlinear Speech Modeling. LNCS (LNAI), vol. 3445, pp. 421–426. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Vondra, M., VĂ­ch, R.: Speech Modeling Using the Complex Cepstrum. In: Esposito, A., Esposito, A.M., Martone, R., MĂ¼ller, V.C., Scarpetta, G. (eds.) COST 2102 Int. Training School 2010. LNCS, vol. 6456, pp. 324–330. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Vondra, M., Vích, R.: Modification of the Glottal Voice Characteristics Based on Changing the Maximum-Phase Speech Component. In: Esposito, A., Vinciarelli, A., Vicsi, K., Pelachaud, C., Nijholt, A. (eds.) Communication and Enactment 2010. LNCS, vol. 6800, pp. 240–251. Springer, Heidelberg (2011)

    Google Scholar 

  9. Vich, R.: Pitch Synchronous Linear Predictive Czech and Slovak Text-to-Speech Synthesis. In: Proc. of the 15th International Congress on Acoustics, ICA 1995, Trondheim, Norway, vol. III, pp. 181–184 (1995)

    Google Scholar 

  10. Vich, R.: Cepstral Speech Model, Padé Approximation, Excitation and Gain Matching in Cepstral Speech Synthesis. In: Jan, J. (ed.) BIOSIGNAL 2000 VUTIUM, Brno, pp. 77–82 (2000)

    Google Scholar 

  11. Oppenheim, A.V., Schafer, R.W., Buck, J.R.: Discrete-Time Signal Processing. Prentice Hall, New Jersey (1999)

    Google Scholar 

  12. Zelinski, R., Noll, P.: Adaptive Coding of Speech Signals. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-25(4), 199–309 (1977)

    Google Scholar 

  13. Tribolet, J.M., Crochiere, R.E.: Frequency Domain Coding of Speech. IEEE Transactions on Acoustics, Speech and Signal Processing ASSP-27(5), 512–530 (1979)

    Article  Google Scholar 

  14. Vondra, M., Vích, R.: Speech Emotion Modification Using a Cepstral Vocoder. In: Esposito, A., Campbell, N., Vogel, C., Hussain, A., Nijholt, A. (eds.) COST 2102 Int. Training School 2009. LNCS, vol. 5967, pp. 280–285. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

VĂ­ch, R., Vondra, M. (2012). Pitch Synchronous Transform Warping in Voice Conversion. In: Esposito, A., Esposito, A.M., Vinciarelli, A., Hoffmann, R., MĂ¼ller, V.C. (eds) Cognitive Behavioural Systems. Lecture Notes in Computer Science, vol 7403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34584-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-34584-5_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-34583-8

  • Online ISBN: 978-3-642-34584-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics