Voice Cloning for Voice Disorders: Impact of Phonetic Content

Wadoux, Lily; Barbot, Nelly; Chevelu, Jonathan; Lolive, Damien

doi:10.1007/978-3-031-40498-6_26

Lily Wadoux¹⁰,
Nelly Barbot¹⁰,
Jonathan Chevelu¹⁰ &
…
Damien Lolive¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14102))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

411 Accesses

Abstract

Organic dysphonia can lead to vocal impairments. Recording patients’ impaired voice could allow them to use voice cloning systems. Voice cloning, being the process of producing speech matching a target speaker voice, given textual input and an audio sample from the speaker, can be used in such a context. However, dysphonic patients may only produce speech with specific or limited phonetic content.

Considering a complete voice cloning process, we investigate the relation between the phonetic content, the length of samples and their impact on the output quality and speaker similarity through the use of phonetically limited artificial voices.

The analysis of the speakers embedding which are used to capture voices shows an impact of the phonetic content. However, we were not able to observe those variations in the final generated speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Cloning and Conversion of an Arbitrary Voice Using Generative Flows

Article 01 October 2022

All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines

Voice Conservation: Towards Creating a Speech-Aid System for Total Laryngectomees

Notes

References

Andreev, P., Alanov, A., Ivanov, O., Vetrov, D.: HiFi++: a unified framework for bandwidth extension and speech enhancement (2022). https://doi.org/10.48550/ARXIV.2203.13086
Arik, S.O., Chen, J., Peng, K., Ping, W., Zhou, Y.: Neural voice cloning with a few samples. In: Advances in Neural Information Processing Systems, pp. 10019–10029 (2018)
Google Scholar
Baevski, A., Zhou, H., Mohamed, A., Auli, M.: wav2vec 2.0: a framework for self-supervised learning of speech representations (2020). https://doi.org/10.48550/ARXIV.2006.11477
Chen, Y., et al.: Sample efficient adaptive text-to-speech. In: Proceedings of the International Conference on Learning Representations (2019)
Google Scholar
Cooper, E., et al.: Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6184–6188 (2020). https://doi.org/10.1109/ICASSP40776.2020.9054535
Jia, Y., et al.: Transfer learning from speaker verification to multispeaker text-to-speech synthesis. In: Proceedings of the Neural Information Processing Systems Conference, no. 32 (2018)
Google Scholar
Le Huche, F., Allali, A.: La voix. Collection Phoniatrie, Elsevier Masson, 2e édition edn. (2010)
Google Scholar
Lo, C.C., et al.: MOSNet: deep learning-based objective assessment for voice conversion. In: Interspeech (2019). https://doi.org/10.21437/Interspeech.2019-2003
Mozilla: CommonVoice, commonvoice.mozilla.org, consulted in December 2020
Prenger, R., Valle, R., Catanzaro, B.: WaveGlow: a flow-based generative network for speech synthesis. In: 2019 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, pp. 3617–3621 (2019). https://doi.org/10.1109/ICASSP.2019.8683143
Shen, J., et al.: Natural TTS synthesis by conditioning WaveNet on Mel spectrogram predictions. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2018)
Google Scholar
Sini, A.: Characterisation and generation of expressivity in function of speaking styles for audiobook synthesis. Theses, Université Rennes 1 (2020)
Google Scholar
Sini, A., Lolive, D., Vidal, G., Tahon, M., Delais-Roussarie, E.: SynPaFlex-corpus: an expressive French audiobooks corpus dedicated to expressive speech synthesis. In: Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan (2018)
Google Scholar
Sini, A., Maguer, S.L., Lolive, D., Delais-Roussarie, E.: Introducing prosodic speaker identity for a better expressive speech synthesis control. In: 10th International Conference on Speech Prosody 2020, Tokyo, Japan, pp. 935–939. ISCA (2020). https://doi.org/10.21437/speechprosody.2020-191. https://hal.science/hal-03000148
Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S.: Deep neural network embeddings for text-independent speaker verification. In: Proceedings of Interspeech (2017)
Google Scholar
Steuer, C.E., El-Deiry, M., Parks, J.R., Higgins, K.A., Saba, N.F.: An update on larynx cancer. CA Cancer J. Clin. 67(1), 31–50 (2017)
Article Google Scholar
Wan, L., Wang, Q., Papir, A., Moreno, I.L.: Generalized end-to-end loss for speaker verification. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4879–4883 (2018)
Google Scholar
Yamagishi, J., Honnet, P.E., Garner, P., Lazaridis, A.: The SIWIS French speech synthesis database. Technical report, Idiap Research Institute (2017)
Google Scholar

Download references

Acknowledgements

This work was granted access to the HPC resources of IDRIS under the allocation 2023-AD011011870R2 made by GENCI.

Author information

Authors and Affiliations

Univ Rennes, CNRS, IRISA, 22300, Lannion, France
Lily Wadoux, Nelly Barbot, Jonathan Chevelu & Damien Lolive

Authors

Lily Wadoux
View author publications
You can also search for this author in PubMed Google Scholar
Nelly Barbot
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan Chevelu
View author publications
You can also search for this author in PubMed Google Scholar
Damien Lolive
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Chevelu .

Editor information

Editors and Affiliations

University of West Bohemia, Pilsen, Czech Republic
Kamil Ekštein
University of West Bohemia, Pilsen, Czech Republic
František Pártl
University of West Bohemia, Pilsen, Czech Republic
Miloslav Konopík

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wadoux, L., Barbot, N., Chevelu, J., Lolive, D. (2023). Voice Cloning for Voice Disorders: Impact of Phonetic Content. In: Ekštein, K., Pártl, F., Konopík, M. (eds) Text, Speech, and Dialogue. TSD 2023. Lecture Notes in Computer Science(), vol 14102. Springer, Cham. https://doi.org/10.1007/978-3-031-40498-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-40498-6_26
Published: 23 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40497-9
Online ISBN: 978-3-031-40498-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Voice Cloning for Voice Disorders: Impact of Phonetic Content

Abstract

Access this chapter

Similar content being viewed by others

Cloning and Conversion of an Arbitrary Voice Using Generative Flows

All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines

Voice Conservation: Towards Creating a Speech-Aid System for Total Laryngectomees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Voice Cloning for Voice Disorders: Impact of Phonetic Content

Abstract

Access this chapter

Similar content being viewed by others

Cloning and Conversion of an Arbitrary Voice Using Generative Flows

All Your Voices are Belong to Us: Stealing Voices to Fool Humans and Machines

Voice Conservation: Towards Creating a Speech-Aid System for Total Laryngectomees

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation