Skip to main content

Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks

  • Conference paper
  • First Online:
Image Analysis and Processing – ICIAP 2023 (ICIAP 2023)

Abstract

The proliferation of algorithms and commercial tools for the creation of synthetic audio has resulted in a significant increase in the amount of inaccurate information, particularly on social media platforms. As a direct result of this, efforts have been concentrated in recent years on identifying the presence of content of this kind. Despite this, there is still a long way to go until this problem is adequately addressed because of the growing naturalness of fake or synthetic audios. In this study, we proposed different networks configurations: a Custom Convolution Neural Network (cCNN) and two pretrained models (VGG16 and MobileNet) as well as end-to-end models to classify real and fake audios. An extensive experimental analysis was carried out on three classes of audio manipulation of the dataset FoR deepfake audio dataset. Also, we combined such sub-datasets to formulate a combined dataset FoR-combined to enhance the performance of the models. The experimental analysis shows that the proposed cCNN outperforms all the baseline models and other reference works with the highest accuracy of 97.23% on FoR-combined and sets new benchmarks for the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Masood, M., Nawaz, M., Malik, K.M., Javed, A., Irtaza, A., Malik, H.: Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Appl. Intell. 53(4), 3974–4026 (2023)

    Article  Google Scholar 

  2. Akhtar, Z.: Deepfakes generation and detection: a short survey. J. Imaging 9(1), 18 (2023)

    Article  Google Scholar 

  3. Malik, K.M., Malik, H., Baumann, R.: Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 523–528. IEEE (2019)

    Google Scholar 

  4. Khanjani, Z., Watson, G., Janeja, V.P.: Audio deepfakes: a survey. Front. Big Data 5, 1001063 (2023). https://doi.org/10.3389/fdata.2022.1001063

    Article  Google Scholar 

  5. Aljasem, M., et al.: Secure automatic speaker verification (SASV) system through SM-ALTP features and asymmetric bagging. IEEE Trans. Inf. Forensics Secur. 16, 3524–3537 (2021)

    Article  Google Scholar 

  6. Firc, A., Malinka, K., Hanácek, P.: Deepfakes as a threat to a speaker and facial recognition: an overview of tools and attack vectors. Heliyon 9(4), e15090 (2023). https://doi.org/10.1016/j.heliyon.2023.e15090

    Article  Google Scholar 

  7. Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)

    Google Scholar 

  8. Reimao, R., Tzerpos, V.: For: A dataset for synthetic speech detection. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), pp. 1–10. IEEE (2019)

    Google Scholar 

  9. Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020)

    Article  Google Scholar 

  10. Wang, R., et al.: Deepsonar: towards effective and robust detection of ai-synthesized fake voices. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1207–1216 (2020)

    Google Scholar 

  11. Camacho, S., Ballesteros, D.M., Renza, D.: Fake speech recognition using deep learning. In: Figueroa-García, J.C., Díaz-Gutierrez, Y., Gaona-García, E.E., Orjuela-Cañón, A.D. (eds.) Applied Computer Sciences in Engineering: 8th Workshop on Engineering Applications, WEA 2021, Medellín, Colombia, October 6–8, 2021, Proceedings, pp. 38–48. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86702-7_4

    Chapter  Google Scholar 

  12. Khochare, J., Joshi, C., Yenarkar, B., Suratkar, S., Kazi, F.: A deep learning framework for audio deepfake detection. Arab. J. Sci. Eng. 47(3), 3447–3458 (2021). https://doi.org/10.1007/s13369-021-06297-w

    Article  Google Scholar 

  13. Iqbal, F., Abbasi, A., Javed, A.R., Jalil, Z., Al-Karaki, J.: Deepfake Audio Detection via Feature Engineering and Machine Learning (2022)

    Google Scholar 

  14. Hamza, A., et al.: Deepfake audio detection via MFCC features using machine learning. IEEE Access 10, 134018–134028 (2022)

    Article  Google Scholar 

  15. Guha, S., Das, A., Singh, P.K., Ahmadian, A., Senu, N., Sarkar, R.: Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals. IEEE Access 8, 182868–182887 (2020)

    Article  Google Scholar 

  16. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  17. Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  18. Alabdulmohsin, I., Maennel, H., Keysers, D.: The impact of reinitialization on generalization in convolutional neural networks. arXiv preprint arXiv:2109.00267 2021

Download references

Acknowledgements

This study has been partially supported by SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union – NextGenerationEU and Sapienza University of Rome project 2022–2024 “EV2” (003 009 22).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taiba Majid Wani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wani, T.M., Amerini, I. (2023). Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14234. Springer, Cham. https://doi.org/10.1007/978-3-031-43153-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43153-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43152-4

  • Online ISBN: 978-3-031-43153-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics