Skip to main content
Log in

Efficient deepfake detection using shallow vision transformer

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deepfake is a deep learning-based technique that generates fake face images by mimicking the distribution of original images. Deepfake images can be used for malicious intent like creating fake news; hence, it is important to detect them at an early stage. The existing works on deepfake detection mainly focus on appearance-based features and also require substantial computing resources, memory and training data to optimize the model. Since these resources may not be available in many situations, it is important to develop a lightweight model which can work under constrained resources. In this work, we propose a shallow vision transformer for deepfake detection. Our proposed model uses an attention mechanism with a multi-head attention module. The attention mechanism highlights the important sections of deepfake images, whereas the multi-head attention module determines the attention that has to be given to each of the local-level features of an image. Finally, the softmax layer is used to classify an image as real or fake. The proposed model is shallow as it has 16.48 times fewer parameters and approx 2.97 times fewer FLOPS than the baseline vision transformer. Experiments on the Real Fake Face (RFF) and Real and Fake Face Detection (RFFD) datasets show that the model can achieve an accuracy of \(92.15\%\) and \(88.52\%\) respectively, which are better than many of the existing state-of-the-art models for deepfake detection like GoogleNet, XceptionNet, ResNet50, MesoNet, CNN and baseline vision transformers. Importantly, shallow ViT achieves an accuracy of \(90.94\%\) when only half of the RFF dataset is used for training the model, thereby demonstrating its applicability in constrained scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available in the Kaggle repositories: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces and https://www.kaggle.com/datasets/ciplab/real-and-fake-face-detection.

Notes

  1. https://www.snopes.com/fact-check/putin-deepfake-russian-surrender

  2. https://github.com/deepfakes/faceswap

  3. https://github.com/shaoanlu/faceswap-GAN

  4. https://github.com/shaoanlu/fewshot-facetranslation-GAN

  5. https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces

  6. https://www.kaggle.com/datasets/ciplab/real-and-fake-face-detection

References

  1. Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp 1–7. https://doi.org/10.1109/WIFS.2018.8630761

  2. Bhardwaj D, Pankajakshan V (2021) An approach to expose dithering-based jpeg anti-forensics. Forensic Sci Int 328:111040. https://doi.org/10.1016/j.forsciint.2021.111040

    Article  Google Scholar 

  3. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https://openreview.net/forum?id=YicbFdNTTy

  4. Gong D, Jaya Kumar Y, Goh OS, Ye Z, Chi W (2021) Deepfakenet, an efficient deepfake detection method. Int J Adv Comput Sci Appl 12. https://doi.org/10.14569/IJACSA.2021.0120622

  5. Guo Y, Cao X, Zhang W, Wang R (2018) Fake colorized image detection. IEEE Trans Inf Forensics Secur 13(8):1932–1944. https://doi.org/10.1109/TIFS.2018.2806926

    Article  Google Scholar 

  6. Guo Z, Yang G, Chen J, Sun X (2021) Fake face detection via adaptive manipulation traces extraction network. Comput Vis Image Underst 204:103170. https://doi.org/10.1016/j.cviu.2021.103170, www.sciencedirect.com/science/article/pii/S107731422100014X

  7. Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–6. https://doi.org/10.1109/AVSS.2018.8639163

  8. Hamid Y, Elyassami S, Gulzar Y et al (2023) An improvised cnn model for fake image detection. Int J Inf Technol 15. https://doi.org/10.1007/s41870-022-01130-5

  9. Han K, Xiao A, Wu E, Guo J, XU C, Wang Y (2021) Transformer in transformer. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in Neural Information Processing Systems, pp 15908–15919. https://proceedings.neurips.cc/paper/2021/file/854d9fca60b4bd07f9bb215d59ef5561-Paper.pdf

  10. Hsu CC, Zhuang Y-X, Lee C-Y (2020) Deep fake image detection based on pairwise learning. Appl Sci 10(1). https://doi.org/10.3390/app10010370, https://www.mdpi.com/2076-3417/10/1/370

  11. Karnouskos S (2020) Artificial intelligence in digital media: the era of deepfakes. IEEE Trans Technol Soc 1(3):138–147. https://doi.org/10.1109/TTS.2020.3001312

    Article  Google Scholar 

  12. Korshunov P, Marcel S (2018) Deepfakes: a new threat to face recognition? assessment and detection. CoRR arXiv:1812.08685

  13. Lee S, Tariq S, Shin Y, Woo SS (2021) Detecting handcrafted facial image manipulations and gan-generated facial images using shallow-fakefacenet. Applied Soft Computing 105:107256. https://doi.org/10.1016/j.asoc.2021.107256, www.sciencedirect.com/science/article/pii/S1568494621001794

  14. McCloskey S, Albright M (2019) Detecting gan-generated imagery using saturation cues. In: 2019 IEEE International Conference on Image Processing (ICIP), pp 4584–4588. https://doi.org/10.1109/ICIP.2019.8803661

  15. Mirsky Y, Lee W (2021) The creation and detection of deepfakes: a survey. ACM Comput Surv 54(1). https://doi.org/10.1145/3425780, https://doi.org/10.1145/3425780

  16. Nguyen HH, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2307–2311. https://doi.org/10.1109/ICASSP.2019.8682602

  17. Nguyen TT, Nguyen QVH, Nguyen DT, Nguyen DT, Huynh-The T, Nahavandi S, Nguyen TT, Pham Q-V, Nguyen CM (2022) Deep learning for deepfakes creation and detection: a survey. Comput Vis Image Underst 223:103525. https://doi.org/10.1016/j.cviu.2022.103525, www.sciencedirect.com/science/article/pii/S1077314222001114

  18. Nguyen XH, Tran TS, Le VT, Nguyen KD, Truong D-T (2021) Learning spatio-temporal features to detect manipulated facial videos created by the deepfake techniques. Forensic Sci Int Digital Investigation 36:301108. https://doi.org/10.1016/j.fsidi.2021.301108, www.sciencedirect.com/science/article/pii/S2666281721000020

  19. Perov I, Gao D, Chervoniy N, Liu K, Marangonda S, Umé C, Dpfks M, Facenheim CS, RP L, Jiang J, Zhang S, Wu P, Zhou B, Zhang W (2021) Deepfacelab: Integrated, flexible and extensible face-swapping framework. https://arxiv.org/abs/2005.05535

  20. Rukundo O (2023) Effects of image size on deep learning. Electronics 12:985. https://doi.org/10.3390/electronics12040985

    Article  Google Scholar 

  21. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 618–626. https://doi.org/10.1109/ICCV.2017.74

  22. Taeb M, Chi H (2022) Comparison of deepfake detection techniques through deep learning. Journal of Cybersecurity and Privacy 2(1):89–106. https://doi.org/10.3390/jcp2010007, www.mdpi.com/2624-800X/2/1/7

  23. Thompson NC, Greenewald K, Lee K, Manso GF (2020) The computational limits of deep learning. http://arxiv.org/abs/2007.05558, cite arxiv:2007.05558

  24. Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Information Fusion 64:131–148. https://doi.org/10.1016/j.inffus.2020.06.014, www.sciencedirect.com/science/article/pii/S1566253520303110

  25. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf

  26. Verdoliva L (2020) Media forensics and deepfakes: an overview. IEEE J Sel Top Sign Proces 14(5):910–932. https://doi.org/10.1109/JSTSP.2020.3002101

    Article  Google Scholar 

  27. Xu Z, Liu J, Lu W, Xu B, Zhao X, Li B, Huang J (2021) Detecting facial manipulated videos based on set convolutional neural networks. J Vis Commun Image Represent 77:103119. https://doi.org/10.1016/j.jvcir.2021.103119, www.sciencedirect.com/science/article/pii/S1047320321000742

  28. Xuan X, Peng B, Wang W, Dong J (2019) On the generalization of gan image forensics. In: Biometric Recognition: 14th Chinese Conference, CCBR 2019, Zhuzhou, China, October 12-13, 2019, Proceedings. Springer-Verlag, Berlin, Heidelberg, pp 134–141. https://doi.org/10.1007/978-3-030-31456-9_15

  29. Yang J, Xiao S, Li A, Lan G, Wang H (2021) Detecting fake images by identifying potential texture difference. Futur Gener Comput Syst 125:127–135. https://doi.org/10.1016/j.future.2021.06.043, www.sciencedirect.com/science/article/pii/S0167739X21002387

  30. Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164

  31. Zhang Y, Zheng L, Thing VLL (2017) Automated face swapping and its detection. In: 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), pp 15–19. https://doi.org/10.1109/SIPROCESS.2017.8124497

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Debanjan Sadhya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Usmani, S., Kumar, S. & Sadhya, D. Efficient deepfake detection using shallow vision transformer. Multimed Tools Appl 83, 12339–12362 (2024). https://doi.org/10.1007/s11042-023-15910-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-15910-z

Keywords

Navigation