A Multi-stage Multi-modal Classification Model for DeepFakes Combining Deep Learned and Computer Vision Oriented Features

Das, Arnab Kumar; Mukhopadhyay, Soumik; Dalui, Arijit; Bhattacharya, Ritaban; Naskar, Ruchira

doi:10.1007/978-3-031-49099-6_13

Arnab Kumar Das¹⁰,
Soumik Mukhopadhyay¹⁰,
Arijit Dalui¹⁰,
Ritaban Bhattacharya¹⁰ &
…
Ruchira Naskar¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14424))

Included in the following conference series:

International Conference on Information Systems Security

370 Accesses

Abstract

Recent advances in deep learning have empowered media synthesis and alteration to achieve levels of realism that were previously unheard of. Artificial intelligence is a potent tool that may be used to modify digital data, such as images, videos, and audio files, through the use of emerging deepfake technologies. Deepfake technology has the potential to significantly affect the reliability of multimedia data through the synthesis of fake media. Significant ramifications arise from this for individuals, organizations, and society at large. With the pace and accessibility of social media, convincing deepfakes can swiftly reach millions of people and adversely influence public opinion. To this end, we propose a multi-modal feature-based classification model that can distinguish between deepfake and real videos efficiently. We have used prefabricated image features as well as a variety of Convolutional Neural Network (CNN) model-generated features, including ResNet50, ResNet101, VGG16, and VGG19. The fake videos are taken up for further investigation to detect their source of origin. We propose a CNN-based classifier for deepfake detection and also explore the efficiency of multiple feature-based classifiers in this respect. This enables us to evaluate the comparative performance of both. The proposed model achieves an accuracy of 99.06% on deepfake classification and 98.75% on source identification when tested on a publicly available FaceForensics++ dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I.J., et al.: Generative adversarial networks (2014)
Google Scholar
Flynn, A., Clough, J., Cooke, T.: Disrupting and preventing deepfake abuse: exploring criminal law responses to AI-facilitated abuse. In: Powell, A., Flynn, A., Sugiura, L. (eds.) The Palgrave Handbook of Gendered Violence and Technology, pp. 583–603. Palgrave Macmillan, Cham (2021). https://doi.org/10.1007/978-3-030-83734-1_29
Chapter Google Scholar
Temir, E.: Deepfake: new era in the age of disinformation & end of reliable journalism. Selçuk İletişim 13(2), 1009–1024 (2020)
Google Scholar
Shin, S.Y., Lee, J.: The effect of deepfake video on news credibility and corrective influence of cost-based knowledge about deepfakes. Digit. Journal. 10(3), 412–432 (2022)
Article Google Scholar
Kwok, A.O.J., Koh, S.G.M.: Deepfake: a social construction of technology perspective. Curr. Issue Tour. 24(13), 1798–1802 (2021)
Article Google Scholar
Lyu, S.: Deepfake detection: current challenges and next steps. In: 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp. 1–6. IEEE (2020)
Google Scholar
Tariq, S., Abuadbba, A., Moore, K.: Deepfake in the metaverse: security implications for virtual gaming, meetings, and offices. arXiv preprint arXiv:2303.14612 (2023)
Yang, W., et al.: Avoid-DF: audio-visual joint learning for detecting deepfake. IEEE Trans. Inf. Forensics Secur. 18, 2015–2029 (2023)
Article Google Scholar
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11 (2019)
Google Scholar
Westerlund, M.: The emergence of deepfake technology: a review. Technol. Innov. Manag. Rev. 9(11) (2019)
Google Scholar
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Article Google Scholar
Korshunova, I., Shi, W., Dambre, J., Theis, L.: Fast face-swap using convolutional neural networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3677–3685 (2017)
Google Scholar
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2005), vol. 1, pp. 886–893. IEEE (2005)
Google Scholar
Koonce, B.: Convolutional Neural Networks with Swift for Tensorflow: Image Recognition and Dataset Categorization. Springer, Cham (2021). https://doi.org/10.1007/978-1-4842-6168-2
Book Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Agarap, A.F.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2018)
Patle, A., Chouhan, D.S.: SVM kernel functions for classification. In: 2013 International Conference on Advances in Technology and Engineering (ICATE), pp. 1–9. IEEE (2013)
Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)
Article MATH Google Scholar
Rokach, L., Maimon, O.: Decision trees. In: Data Mining and Knowledge Discovery Handbook, pp. 165–192 (2005)
Google Scholar
Guo, G., Wang, H., Bell, D., Bi, Y., Greer, K.: KNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) OTM 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39964-3_62
Chapter Google Scholar
Abramovich, F., Grinshtein, V., Levy, T.: Multiclass classification by sparse multinomial logistic regression. IEEE Trans. Inf. Theory 67(7), 4637–4646 (2021)
Article MathSciNet MATH Google Scholar
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: real-time face capture and reenactment of RGB videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2387–2395 (2016)
Google Scholar
Jolliffe, I.T., Cadima, J.: Principal component analysis: a review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 374(2065), 20150202 (2016)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Engineering Science and Technology, Shibpur, 711103, India
Arnab Kumar Das, Soumik Mukhopadhyay, Arijit Dalui, Ritaban Bhattacharya & Ruchira Naskar

Authors

Arnab Kumar Das
View author publications
You can also search for this author in PubMed Google Scholar
Soumik Mukhopadhyay
View author publications
You can also search for this author in PubMed Google Scholar
Arijit Dalui
View author publications
You can also search for this author in PubMed Google Scholar
Ritaban Bhattacharya
View author publications
You can also search for this author in PubMed Google Scholar
Ruchira Naskar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnab Kumar Das .

Editor information

Editors and Affiliations

Griffith University, Gold Coast, QLD, Australia
Vallipuram Muthukkumarasamy
Centre for Development of Advanced Computing (C-DAC), Bangalore, India
Sithu D. Sudarsan
Indian Institute of Technology Bombay, Mumbai, India
Rudrapatna K. Shyamasundar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Das, A.K., Mukhopadhyay, S., Dalui, A., Bhattacharya, R., Naskar, R. (2023). A Multi-stage Multi-modal Classification Model for DeepFakes Combining Deep Learned and Computer Vision Oriented Features. In: Muthukkumarasamy, V., Sudarsan, S.D., Shyamasundar, R.K. (eds) Information Systems Security. ICISS 2023. Lecture Notes in Computer Science, vol 14424. Springer, Cham. https://doi.org/10.1007/978-3-031-49099-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-49099-6_13
Published: 09 December 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-49098-9
Online ISBN: 978-3-031-49099-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Multi-stage Multi-modal Classification Model for DeepFakes Combining Deep Learned and Computer Vision Oriented Features