Creation and Detection of German Voice Deepfakes

Barnekow, Vanessa; Binder, Dominik; Kromrey, Niclas; Munaretto, Pascal; Schaad, Andreas; Schmieder, Felix

doi:10.1007/978-3-031-08147-7_24

Vanessa Barnekow¹²,
Dominik Binder¹²,
Niclas Kromrey¹²,
Pascal Munaretto¹²,
Andreas Schaad¹² &
…
Felix Schmieder¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13291))

Included in the following conference series:

International Symposium on Foundations and Practice of Security

1297 Accesses

Abstract

Synthesizing voice with the help of machine learning techniques has made rapid progress over the last years [1]. Given the current increase in using conferencing tools for online teaching, we question just how easy (i.e. needed data, hardware, skill set) it would be to create a convincing voice fake. We analyse how much training data a participant (e.g. a student) would actually need to fake another participants voice (e.g. a professor). We provide an analysis of the existing state of the art in creating voice deep fakes and align the identified as well as our own optimization techniques in the context of two different voice data sets. A user study with more than 100 participants shows how difficult it is to identify real and fake voice (on avg. only 37% can recognize a professor’s fake voice). From a longer-term societal perspective such voice deep fakes may lead to a disbelief by default.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detecting Deepfake Voices Using a Novel Method for Authenticity Verification in Voice-Based Communication

Voice Conversion from Arbitrary Speakers Based on Deep Neural Networks with Adversarial Learning

Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures

Article 28 June 2023

Notes

References

Wang, Y., et al.: Towards end-to-end speech synthesis (2017)
Google Scholar
Stupp, C.: Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case (2019). https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402. Accessed 14 July 2021
Shen, J., et al.: Natural TTS synthesis by conditioning Wavenet on MEL spectrogram predictions (2018)
Google Scholar
Łańcucki, A.: Fastpitch: Parallel text-to-speech with pitch prediction (2021)
Google Scholar
Ren, Y., et al.: Fastspeech 2: Fast and high-quality end-to-end text to speech (2021)
Google Scholar
van den Oord, A., et al.: A generative model for raw audio, Wavenet (2016)
Google Scholar
Barnekow, V., Binder, D., Kromrey, N., Munaretto, P., Schaad, A., Schmieder, F.: Creation and detection of german voice deepfakes (2021)
Google Scholar
NVIDIA. Deep Learning Performance Documentation (2021). https://docs.nvidia.com/deeplearning/performance/mixed-precision-training. Accessed 31 Mar 2021
Prenger, R., Valle, R., Catanzaro, B.: A flow-based generative network for speech synthesis, Waveglow (2018)
Google Scholar
Kumar, K., et al.: Generative adversarial networks for conditional waveform synthesis, Melgan (2019)
Google Scholar
Yamamoto, R., Song, E., Kim, J.-M.: Parallel wavegan: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram (2020)
Google Scholar
Maccarone, T.J.: The biphase explained: understanding the asymmetries in coupled fourier components of astronomical time series. Monthly Notices Roy. Astron. Soc. 435(4), 3547–3558 (2013). ISSN: 0035–8711. https://doi.org/10.1093/mnras/stt1546
AlBadawy, E.A., Lyu, S., Farid, H.: Detecting AI-synthesized speech using bispectral analysis. In: CVPR Workshops, pp. 104–109 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

Offenburg University of Applied Sciences, Offenburg, Germany
Vanessa Barnekow, Dominik Binder, Niclas Kromrey, Pascal Munaretto, Andreas Schaad & Felix Schmieder

Authors

Vanessa Barnekow
View author publications
You can also search for this author in PubMed Google Scholar
Dominik Binder
View author publications
You can also search for this author in PubMed Google Scholar
Niclas Kromrey
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Munaretto
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Schaad
View author publications
You can also search for this author in PubMed Google Scholar
Felix Schmieder
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vanessa Barnekow .

Editor information

Editors and Affiliations

University of Montreal, Montreal, QC, Canada
Esma Aïmeur
Télécom SudParis, Palaiseau, France
Maryline Laurent
IRT SystemX, Palaiseau, France
Reda Yaich
University of Montreal, Montreal, QC, Canada
Benoît Dupont
Télécom SudParis, Palaiseau, France
Joaquin Garcia-Alfaro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barnekow, V., Binder, D., Kromrey, N., Munaretto, P., Schaad, A., Schmieder, F. (2022). Creation and Detection of German Voice Deepfakes. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2021. Lecture Notes in Computer Science, vol 13291. Springer, Cham. https://doi.org/10.1007/978-3-031-08147-7_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-08147-7_24
Published: 15 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-08146-0
Online ISBN: 978-3-031-08147-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Creation and Detection of German Voice Deepfakes