Skip to main content

Fake Speech Recognition Using Deep Learning

  • Conference paper
  • First Online:

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1431))

Abstract

The increase in the number of algorithms and commercial tools for creating synthetic audio has led to a high level of misinformation, especially on social media. As a consequence, efforts have been focused in recent years on detecting this type of content. However, this task is far from being successfully addressed, as the naturalness of fake audios is increasing. In this paper we present a model to classify audios between natural and fake, using an audio preparation stage that includes raw audio transformation, and a modelling stage by means of a custom Convolutional Neural Network (CNN) architecture. Our model is trained on data from the FoR dataset, which contains natural and synthetic audios obtained from several algorithms for deepfake content generation. The performance of the model is evaluated with different metrics such as F1 score, precision (P) and recall (R). According to the results, the audios are successfully classified in 88.9% of the cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Kietzmann, J., Lee, L.W., McCarthy, I.P., Kietzmann, T.C.: DeepFakes: trick or treat? Bus. Horiz. 63(2), 135–146 (2020)

    Article  Google Scholar 

  2. Paris, B., Donovan, J.: Deepfakes and cheap fakes. Data Soc. 47 (2019)

    Google Scholar 

  3. Ahmed, S.: Who inadvertently shares deepfakes? Analyzing the role of political interest, cognitive ability, and social network size. Telemat. Inf. 57, 101508 (2021)

    Google Scholar 

  4. Lieto, A., et al.: Hello? Who am i talking to? A shallow CNN approach for human vs. bot speech classification. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, vol. 2019, pp. 2577–2581 (2019)

    Google Scholar 

  5. Yu, P., Xia, Z., Fei, J., Lu, Y.: A survey on deepfake video detection. IET Biomet. (2021)

    Google Scholar 

  6. Guera, D., Delp, E.J.: Deepfake video detection using recurrent neural networks. In: Proceedings of AVSS 2018–2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance, pp. 1–6 (2019)

    Google Scholar 

  7. Dolhansky, B., Bitton, J., Pflaum, B., Lu, J., Howes, R., Wang, M., Ferrer, C.C.: The deepfake detection challenge dataset. arXiv preprint arXiv:2006.07397 (2020)

  8. Lyu, S.: Deepfake detection: Current challenges and next steps, pp. 1–6 (2020)

    Google Scholar 

  9. Nguyen, T.T., Nguyen, C.M., Nguyen, D.T., Nguyen, D.T., Nahavandi, S.: Deep Learning for Deepfakes Creation and Detection: A Survey, pp. 1–12 (2019)

    Google Scholar 

  10. van den Oord, A., et al.: WaveNet: A Generative Model for Raw Audio, pp. 1–15 (2016)

    Google Scholar 

  11. Elias, I., et al.: Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling (2021)

    Google Scholar 

  12. Saito, Y., Takamichi, S., Saruwatari, H.: Vocoder-free text-to-speech synthesis incorporating generative adversarial networks using low-/multi-frequency STFT amplitude spectra. Comput. Speech Lang. 58, 347–363 (2019)

    Article  Google Scholar 

  13. Arik, S., et al.: Deep voice: real-time neural text-to-speech. In: 34th International Conference on Machine Learning, ICML 2017, vol. 1, pp. 264–273 (2017)

    Google Scholar 

  14. Arik, S.O., et al.: Deep voice 2: multi-speaker neural text-to-speech. In: Advances in Neural Information Processing Systems, vol. 2017, pp. 2963–2971 (2017)

    Google Scholar 

  15. Ping, W., et al.: Deep voice 3: scaling text-to-speech with convolutional sequence learning. In: 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings, pp. 1–16 (2018)

    Google Scholar 

  16. Zhu, X., Xue, L.: Building a controllable expressive speech synthesis system with multiple emotion strengths. Cogn. Syst. Res. 59, 151–159 (2020)

    Article  Google Scholar 

  17. Maiti, S., Marchi, E., Conkie, A.: Generating multilingual voices using speaker space translation based on bilingual speaker data. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7624–7628. IEEE (2020)

    Google Scholar 

  18. Zhao, Y., et al.: Voice conversion challenge 2020: intra-lingual semi-parallel and cross-lingual voice conversion. arXiv preprint arXiv:2008.12527 (2020)

  19. Sisman, B., Yamagishi, J., Member, S., King, S.: An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning, pp. 1–27 (2008)

    Google Scholar 

  20. Mohammadi, S.H., Kain, A.: An overview of voice conversion systems. Speech Commun. 88, 65–82 (2017)

    Article  Google Scholar 

  21. Canton, C., Brian Dolhansky, J.B., Ben Pflaum, J.P., Lu, J.: Deepfake detection challenge results: An open initiative to advance AI, June 2020https://ai.facebook.com/blog/deepfake-detection-challenge-results-an-open-initiative-to-advance-ai/

  22. Héctor, N., Tomi, K., Xuechen, A., Jose, M.S., Massimiliano, X.W., Junichi. ASVSPOOF 2021: Automatic speaker verification spoofing and countermeasures challenge evaluation plan (2021)

    Google Scholar 

  23. Reimao, R., Tzerpos, V.: FoR: a dataset for synthetic speech detection. In: 2019 10th International Conference on Speech Technology and Human-Computer Dialogue, SpeD 2019 (2019)

    Google Scholar 

  24. Ballesteros, D.M., Rodriguez, Y., Renza, D.: A dataset of histograms of original and fake voice recordings (h-voice). Data Brief 29, 105331 (2020)

    Google Scholar 

  25. Rodriguez, Y., Ballesteros, D.M., Renza, S.: Fake voice recordings (imitation), November 2019

    Google Scholar 

  26. Wang, R., et al.: DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices (2020)

    Google Scholar 

  27. AlBadawy, E.A., Lyu, S., Farid, H.: Detecting AI-synthesized speech using bispectral analysis. In: CVPR Workshops, pp. 104–109 (2019)

    Google Scholar 

  28. Chen, T., Kumar, A., Nagarsheth, P., Sivaraman, G., Khoury, E.: Generalization of audio deepfake detection. In: Proceedings of the Odyssey Speaker and Language Recognition Workshop, Tokyo, Japan, pp. 1–5 (2020)

    Google Scholar 

  29. Gao, Y., Vuong, T., Elyasi, M., Bharaj, G., Singh, R., et al.: Generalized spoofing detection inspired from audio generation artifacts. arXiv preprint arXiv:2104.04111 (2021)

  30. Ballesteros, D.M., Rodriguez-Ortega, Y., Renza, D., Arce, G.: Deep4SNet: deep learning for fake speech classification. Expert Syst. Appl. 184, 115465 (2021)

    Google Scholar 

  31. Rodríguez-Ortega, Y., Ballesteros, D.M., Renza, D.: A machine learning model to detect fake voice. In: Florez, H., Misra, S. (eds.) ICAI 2020. CCIS, vol. 1277, pp. 3–13. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-61702-8_1

    Chapter  Google Scholar 

Download references

Acknowledgment

This work is supported by the “Universidad Militar Nueva Granada - Vicerrectoría de Investigaciones” under the grant IMP-ING-2936 of 2019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steven Camacho .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Camacho, S., Ballesteros, D.M., Renza, D. (2021). Fake Speech Recognition Using Deep Learning. In: Figueroa-García, J.C., Díaz-Gutierrez, Y., Gaona-García, E.E., Orjuela-Cañón, A.D. (eds) Applied Computer Sciences in Engineering. WEA 2021. Communications in Computer and Information Science, vol 1431. Springer, Cham. https://doi.org/10.1007/978-3-030-86702-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86702-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86701-0

  • Online ISBN: 978-3-030-86702-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics