Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks

Wani, Taiba Majid; Amerini, Irene

doi:10.1007/978-3-031-43153-1_14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14234))

Included in the following conference series:

International Conference on Image Analysis and Processing

800 Accesses
2 Citations

Abstract

The proliferation of algorithms and commercial tools for the creation of synthetic audio has resulted in a significant increase in the amount of inaccurate information, particularly on social media platforms. As a direct result of this, efforts have been concentrated in recent years on identifying the presence of content of this kind. Despite this, there is still a long way to go until this problem is adequately addressed because of the growing naturalness of fake or synthetic audios. In this study, we proposed different networks configurations: a Custom Convolution Neural Network (cCNN) and two pretrained models (VGG16 and MobileNet) as well as end-to-end models to classify real and fake audios. An extensive experimental analysis was carried out on three classes of audio manipulation of the dataset FoR deepfake audio dataset. Also, we combined such sub-datasets to formulate a combined dataset FoR-combined to enhance the performance of the models. The experimental analysis shows that the proposed cCNN outperforms all the baseline models and other reference works with the highest accuracy of 97.23% on FoR-combined and sets new benchmarks for the datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Masood, M., Nawaz, M., Malik, K.M., Javed, A., Irtaza, A., Malik, H.: Deepfakes generation and detection: State-of-the-art, open challenges, countermeasures, and way forward. Appl. Intell. 53(4), 3974–4026 (2023)
Article Google Scholar
Akhtar, Z.: Deepfakes generation and detection: a short survey. J. Imaging 9(1), 18 (2023)
Article Google Scholar
Malik, K.M., Malik, H., Baumann, R.: Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In: 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 523–528. IEEE (2019)
Google Scholar
Khanjani, Z., Watson, G., Janeja, V.P.: Audio deepfakes: a survey. Front. Big Data 5, 1001063 (2023). https://doi.org/10.3389/fdata.2022.1001063
Article Google Scholar
Aljasem, M., et al.: Secure automatic speaker verification (SASV) system through SM-ALTP features and asymmetric bagging. IEEE Trans. Inf. Forensics Secur. 16, 3524–3537 (2021)
Article Google Scholar
Firc, A., Malinka, K., Hanácek, P.: Deepfakes as a threat to a speaker and facial recognition: an overview of tools and attack vectors. Heliyon 9(4), e15090 (2023). https://doi.org/10.1016/j.heliyon.2023.e15090
Article Google Scholar
Todisco, M., et al.: ASVspoof 2019: future horizons in spoofed and fake audio detection. arXiv preprint arXiv:1904.05441 (2019)
Google Scholar
Reimao, R., Tzerpos, V.: For: A dataset for synthetic speech detection. In: 2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD), pp. 1–10. IEEE (2019)
Google Scholar
Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S.: A survey of the recent architectures of deep convolutional neural networks. Artif. Intell. Rev. 53, 5455–5516 (2020)
Article Google Scholar
Wang, R., et al.: Deepsonar: towards effective and robust detection of ai-synthesized fake voices. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1207–1216 (2020)
Google Scholar
Camacho, S., Ballesteros, D.M., Renza, D.: Fake speech recognition using deep learning. In: Figueroa-García, J.C., Díaz-Gutierrez, Y., Gaona-García, E.E., Orjuela-Cañón, A.D. (eds.) Applied Computer Sciences in Engineering: 8th Workshop on Engineering Applications, WEA 2021, Medellín, Colombia, October 6–8, 2021, Proceedings, pp. 38–48. Springer International Publishing, Cham (2021). https://doi.org/10.1007/978-3-030-86702-7_4
Chapter Google Scholar
Khochare, J., Joshi, C., Yenarkar, B., Suratkar, S., Kazi, F.: A deep learning framework for audio deepfake detection. Arab. J. Sci. Eng. 47(3), 3447–3458 (2021). https://doi.org/10.1007/s13369-021-06297-w
Article Google Scholar
Iqbal, F., Abbasi, A., Javed, A.R., Jalil, Z., Al-Karaki, J.: Deepfake Audio Detection via Feature Engineering and Machine Learning (2022)
Google Scholar
Hamza, A., et al.: Deepfake audio detection via MFCC features using machine learning. IEEE Access 10, 134018–134028 (2022)
Article Google Scholar
Guha, S., Das, A., Singh, P.K., Ahmadian, A., Senu, N., Sarkar, R.: Hybrid feature selection method based on harmony search and naked mole-rat algorithms for spoken language identification from audio signals. IEEE Access 8, 182868–182887 (2020)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Howard, A.G., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Alabdulmohsin, I., Maennel, H., Keysers, D.: The impact of reinitialization on generalization in convolutional neural networks. arXiv preprint arXiv:2109.00267 2021

Download references

Acknowledgements

This study has been partially supported by SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the European Union – NextGenerationEU and Sapienza University of Rome project 2022–2024 “EV2” (003 009 22).

Author information

Authors and Affiliations

Sapienza University of Rome, Rome, Italy
Taiba Majid Wani & Irene Amerini

Authors

Taiba Majid Wani
View author publications
You can also search for this author in PubMed Google Scholar
Irene Amerini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taiba Majid Wani .

Editor information

Editors and Affiliations

University of Udine, Udine, Italy
Gian Luca Foresti
University of Udine, Udine, Italy
Andrea Fusiello
University of York, York, UK
Edwin Hancock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wani, T.M., Amerini, I. (2023). Deepfakes Audio Detection Leveraging Audio Spectrogram and Convolutional Neural Networks. In: Foresti, G.L., Fusiello, A., Hancock, E. (eds) Image Analysis and Processing – ICIAP 2023. ICIAP 2023. Lecture Notes in Computer Science, vol 14234. Springer, Cham. https://doi.org/10.1007/978-3-031-43153-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-031-43153-1_14
Published: 05 September 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43152-4
Online ISBN: 978-3-031-43153-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics