Skip to main content

Creation and Detection of German Voice Deepfakes

  • Conference paper
  • First Online:
Foundations and Practice of Security (FPS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13291))

Included in the following conference series:

  • 1297 Accesses

Abstract

Synthesizing voice with the help of machine learning techniques has made rapid progress over the last years [1]. Given the current increase in using conferencing tools for online teaching, we question just how easy (i.e. needed data, hardware, skill set) it would be to create a convincing voice fake. We analyse how much training data a participant (e.g. a student) would actually need to fake another participants voice (e.g. a professor). We provide an analysis of the existing state of the art in creating voice deep fakes and align the identified as well as our own optimization techniques in the context of two different voice data sets. A user study with more than 100 participants shows how difficult it is to identify real and fake voice (on avg. only 37% can recognize a professor’s fake voice). From a longer-term societal perspective such voice deep fakes may lead to a disbelief by default.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/readbeyond/aeneas.

  2. 2.

    https://github.com/NVIDIA/tacotron2.

References

  1. Wang, Y., et al.: Towards end-to-end speech synthesis (2017)

    Google Scholar 

  2. Stupp, C.: Fraudsters Used AI to Mimic CEO’s Voice in Unusual Cybercrime Case (2019). https://www.wsj.com/articles/fraudsters-use-ai-to-mimic-ceos-voice-in-unusual-cybercrime-case-11567157402. Accessed 14 July 2021

  3. Shen, J., et al.: Natural TTS synthesis by conditioning Wavenet on MEL spectrogram predictions (2018)

    Google Scholar 

  4. Łańcucki, A.: Fastpitch: Parallel text-to-speech with pitch prediction (2021)

    Google Scholar 

  5. Ren, Y., et al.: Fastspeech 2: Fast and high-quality end-to-end text to speech (2021)

    Google Scholar 

  6. van den Oord, A., et al.: A generative model for raw audio, Wavenet (2016)

    Google Scholar 

  7. Barnekow, V., Binder, D., Kromrey, N., Munaretto, P., Schaad, A., Schmieder, F.: Creation and detection of german voice deepfakes (2021)

    Google Scholar 

  8. NVIDIA. Deep Learning Performance Documentation (2021). https://docs.nvidia.com/deeplearning/performance/mixed-precision-training. Accessed 31 Mar 2021

  9. Prenger, R., Valle, R., Catanzaro, B.: A flow-based generative network for speech synthesis, Waveglow (2018)

    Google Scholar 

  10. Kumar, K., et al.: Generative adversarial networks for conditional waveform synthesis, Melgan (2019)

    Google Scholar 

  11. Yamamoto, R., Song, E., Kim, J.-M.: Parallel wavegan: a fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram (2020)

    Google Scholar 

  12. Maccarone, T.J.: The biphase explained: understanding the asymmetries in coupled fourier components of astronomical time series. Monthly Notices Roy. Astron. Soc. 435(4), 3547–3558 (2013). ISSN: 0035–8711. https://doi.org/10.1093/mnras/stt1546

  13. AlBadawy, E.A., Lyu, S., Farid, H.: Detecting AI-synthesized speech using bispectral analysis. In: CVPR Workshops, pp. 104–109 (2019)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vanessa Barnekow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barnekow, V., Binder, D., Kromrey, N., Munaretto, P., Schaad, A., Schmieder, F. (2022). Creation and Detection of German Voice Deepfakes. In: Aïmeur, E., Laurent, M., Yaich, R., Dupont, B., Garcia-Alfaro, J. (eds) Foundations and Practice of Security. FPS 2021. Lecture Notes in Computer Science, vol 13291. Springer, Cham. https://doi.org/10.1007/978-3-031-08147-7_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-08147-7_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-08146-0

  • Online ISBN: 978-3-031-08147-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics