Skip to main content

Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM

  • Conference paper
  • First Online:
Book cover Speech and Computer (SPECOM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10458))

Included in the following conference series:

Abstract

The paper is devoted to improving the methods of voice conversion (VC) for developing text-to-speech synthesis systems with capabilities of tuning on the target speaker. Such system with VC module in acoustic processor, parametric representation of speech database for concatenative synthesis based on instantaneous harmonic representation is presented in the paper. Voice conversion is based on multiple regression mapping function and Gaussian mixture model (GMM), the method of text-independent learning is based on hidden Markov models and modified Viterbi algorithm. Experimental evaluation of the proposed solutions in terms of naturalness and similarity is presented as well.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Sebastian, A.: Adobe demos “photoshop for audio,” lets you edit speech as easily as text. In Ars Technika, electronic resource (2016). https://goo.gl/yCkGyp

  2. McTear, M., Callejas, Z., Griol, D.: The Conversational Interface: Talking to Smart Devices. Springer, Switzerland (2016)

    Google Scholar 

  3. Dutoit, T.: An Introduction to Text-to-Speech Synthesis. Springer, Netherlands (2013)

    Google Scholar 

  4. Taylor, P.: Text-to-Speech Synthesis. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  5. Shikano, K., Lee, K., Reddy, R.: Speaker adaptation through vector quantization. In: ICASSP 1986, Japan, Tokyo, pp. 231–237 (1986)

    Google Scholar 

  6. Klabbers, E., Veldhuis, R.: Reducing audible spectral discontinuities. IEEE Trans. Speech Audio Process. 9(1), 39–51 (2001)

    Article  Google Scholar 

  7. Vepa, J., King, S.: Subjective evaluation of join cost and smoothing methods for unit selection speech synthesis. IEEE Trans. Audio Speech Lang. Process. 14(5), 1763–1771 (2006)

    Article  Google Scholar 

  8. Kirkpatrick, B., O’Brien, D., Scaife, R.: Feature transformation applied to the detection of discontinuities in concatenated speech. In: SSW6-2007, pp. 17–21 (2007)

    Google Scholar 

  9. Stylianou, Y.: Applying the harmonic plus noise model in concatenative speech synthesis. IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)

    Article  Google Scholar 

  10. Kawahara, H.: STRAIGHT, exploitation of the other aspect of VOCODER: perceptually isomorphic decomposition of speech sounds. Acoust. Sci. Technol. 27(6), 349–353 (2006)

    Article  Google Scholar 

  11. Agiomyrgiannakis, Y.: Vocaine the vocoder and applications in speech synthesis. In: ICASSP 2015, Brisbane, Australia, pp. 4230–4234, April 2015

    Google Scholar 

  12. Azarov, E., Vashkevich, M., Petrovsky, A.: Instantaneous harmonic representation of speech using multicomponent sinusoidal excitation. In: INTERSPEECH-2013, Lyon, France, pp. 1697–1701 (2013)

    Google Scholar 

  13. Nilsson, M., Resch, B., Kim, M-Y., Kleijn, W.B.: A canonical representation of speech. In: ICASSP-2007, Honolulu, USA, pp. 849–852, April 2007

    Google Scholar 

  14. Azarov, E., Vashkevich, M., Petrovsky, A.: Guslar: a framework for automated singing voice correction. In: ICASSP-2014, Florence, Italy, pp. 7919–7923 (2014)

    Google Scholar 

  15. Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)

    Article  Google Scholar 

  16. Stylinau, Y.: Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6, 131–142 (1998)

    Article  Google Scholar 

  17. Zahariev, V., Petrovsky, A.: Voice conversion based on GMM with multifactor regression function and spectral weighting. Speech Technol. 3, 40–54 (2014)

    Google Scholar 

  18. Rabiner, L.: Fundamentals of Speech Recognition. Printice Hall, United States (1993)

    Google Scholar 

  19. Zahariev, V., Petrovsky, A.: Text-independent learning in the voice conversion system based on hidden Markov models and the grapheme-to-phoneme conversion scheme. In: DSPA-2013, Moscow Russia, pp. 327–332, March 2013

    Google Scholar 

Download references

Acknowledgment

This work was supported by IT4YOU company (Moscow, Russian Federation).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vadim Zahariev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Zahariev, V., Azarov, E., Petrovsky, A. (2017). Voice Conversion for TTS Systems with Tuning on the Target Speaker Based on GMM. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_79

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66429-3_79

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66428-6

  • Online ISBN: 978-3-319-66429-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics