Abstract
In this study, we propose a voice conversion technique from arbitrary speakers based on deep neural networks using adversarial learning, which is realized by introducing adversarial learning to the conventional voice conversion. Adversarial learning is expected to enable us more natural voice conversion by using a discriminative model which classifies input speech to natural speech or converted speech in addition to a generative model. Experiments showed that proposed method was effective to enhance global variance (GV) of mel-cepstrum but naturalness of converted speech was a little lower than speech using the conventional variance compensation technique.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Desai, S., Raghavendra, E.V., Yegnanarayana, B., Black, A.W., Prahallad, K.: Voice conversion using artificial neural networks. In: Proceedings of the ICASSP, pp. 3893–3896 (2009)
Furui, S.: Speaker-independent isolated word recognition using dynamic features of speech spectrum. IEEE Trans. Acoust. Speech Sig. Process. 34(1), 52–59 (1986)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, pp. 2672–2680 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural comput. 9(8), 1735–1780 (1997)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint (2015). arXiv:1502.03167
Kain, A., Macon, M.: Spectral voice conversion for text-to-speech synthesis. In: Proceedings of the ICASSP, pp. 285–288 (1998)
Koike, H., Nose, T., Shinozaki, T., Ito, A.: Improvement of quality of voice conversion based on spectral differential filter using straight-based mel-cepstral coefficients. J. Acoust. Soc. Am. 140(4), 2963–2963 (2016)
Ling, Z.H., Wu, Y.J., Wang, Y.P., Qin, L., Wang, R.H.: USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method. In: Blizzard Challenge Workshop (2006)
Morise, M., Yokomori, F., Ozawa, K.: World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE Trans. Inf. Syst. 99(7), 1877–1884 (2016)
Nose, T., Ota, Y., Kobayashi, T.: HMM-based voice conversion using quantized F0 context. IEICE Trans. Inf. Syst. E93–D(9), 2483–2490 (2010)
Nose, T.: Efficient implementation of global variance compensation for parametric speech synthesis. IEEE/ACM Trans. Audio Speech Lang. Process. 24(10), 1694–1704 (2016)
Pilkington, N.C., Zen, H., Gales, M.J., et al.: Gaussian process experts for voice conversion. In: Proceedings of the INTERSPEECH, pp. 2772–2775 (2011)
Saito, Y., Takamichi, S., Saruwatari, H.: Training algorithm to deceive anti-spoofing verification for DNN-based speech synthesis. In: Proceedings of the ICASSP
Stylianou, Y.: Voice transformation: a survey. In: Proceedings of the ICASSP, pp. 3585–3588 (2009)
Tomoki, T., Tokuda, K.: A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Syst. 90(5), 816–824 (2007)
Acknowledgment
Part of this work was supported by JSPS KAKENHI Grant Number JP26280055 and JP15H02720.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Miyamoto, S. et al. (2018). Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning. In: Pan, JS., Tsai, PW., Watada, J., Jain, L. (eds) Advances in Intelligent Information Hiding and Multimedia Signal Processing. IIH-MSP 2017. Smart Innovation, Systems and Technologies, vol 82. Springer, Cham. https://doi.org/10.1007/978-3-319-63859-1_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-63859-1_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63858-4
Online ISBN: 978-3-319-63859-1
eBook Packages: EngineeringEngineering (R0)