Efficient deepfake detection using shallow vision transformer

Usmani, Shaheen; Kumar, Sunil; Sadhya, Debanjan

doi:10.1007/s11042-023-15910-z

Efficient deepfake detection using shallow vision transformer

Published: 24 June 2023

Volume 83, pages 12339–12362, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

669 Accesses
Explore all metrics

Abstract

Deepfake is a deep learning-based technique that generates fake face images by mimicking the distribution of original images. Deepfake images can be used for malicious intent like creating fake news; hence, it is important to detect them at an early stage. The existing works on deepfake detection mainly focus on appearance-based features and also require substantial computing resources, memory and training data to optimize the model. Since these resources may not be available in many situations, it is important to develop a lightweight model which can work under constrained resources. In this work, we propose a shallow vision transformer for deepfake detection. Our proposed model uses an attention mechanism with a multi-head attention module. The attention mechanism highlights the important sections of deepfake images, whereas the multi-head attention module determines the attention that has to be given to each of the local-level features of an image. Finally, the softmax layer is used to classify an image as real or fake. The proposed model is shallow as it has 16.48 times fewer parameters and approx 2.97 times fewer FLOPS than the baseline vision transformer. Experiments on the Real Fake Face (RFF) and Real and Fake Face Detection (RFFD) datasets show that the model can achieve an accuracy of \(92.15\%\) and \(88.52\%\) respectively, which are better than many of the existing state-of-the-art models for deepfake detection like GoogleNet, XceptionNet, ResNet50, MesoNet, CNN and baseline vision transformers. Importantly, shallow ViT achieves an accuracy of \(90.94\%\) when only half of the RFF dataset is used for training the model, thereby demonstrating its applicability in constrained scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

CBAM: Convolutional Block Attention Module

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Data availability

The datasets generated during and/or analyzed during the current study are available in the Kaggle repositories: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces and https://www.kaggle.com/datasets/ciplab/real-and-fake-face-detection.

Notes

References

Afchar D, Nozick V, Yamagishi J, Echizen I (2018) Mesonet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp 1–7. https://doi.org/10.1109/WIFS.2018.8630761
Bhardwaj D, Pankajakshan V (2021) An approach to expose dithering-based jpeg anti-forensics. Forensic Sci Int 328:111040. https://doi.org/10.1016/j.forsciint.2021.111040
Article Google Scholar
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. https://openreview.net/forum?id=YicbFdNTTy
Gong D, Jaya Kumar Y, Goh OS, Ye Z, Chi W (2021) Deepfakenet, an efficient deepfake detection method. Int J Adv Comput Sci Appl 12. https://doi.org/10.14569/IJACSA.2021.0120622
Guo Y, Cao X, Zhang W, Wang R (2018) Fake colorized image detection. IEEE Trans Inf Forensics Secur 13(8):1932–1944. https://doi.org/10.1109/TIFS.2018.2806926
Article Google Scholar
Guo Z, Yang G, Chen J, Sun X (2021) Fake face detection via adaptive manipulation traces extraction network. Comput Vis Image Underst 204:103170. https://doi.org/10.1016/j.cviu.2021.103170, www.sciencedirect.com/science/article/pii/S107731422100014X
Güera D, Delp EJ (2018) Deepfake video detection using recurrent neural networks. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–6. https://doi.org/10.1109/AVSS.2018.8639163
Hamid Y, Elyassami S, Gulzar Y et al (2023) An improvised cnn model for fake image detection. Int J Inf Technol 15. https://doi.org/10.1007/s41870-022-01130-5
Han K, Xiao A, Wu E, Guo J, XU C, Wang Y (2021) Transformer in transformer. In: Ranzato M, Beygelzimer A, Dauphin Y, Liang PS, Vaughan JW (eds) Advances in Neural Information Processing Systems, pp 15908–15919. https://proceedings.neurips.cc/paper/2021/file/854d9fca60b4bd07f9bb215d59ef5561-Paper.pdf
Hsu CC, Zhuang Y-X, Lee C-Y (2020) Deep fake image detection based on pairwise learning. Appl Sci 10(1). https://doi.org/10.3390/app10010370, https://www.mdpi.com/2076-3417/10/1/370
Karnouskos S (2020) Artificial intelligence in digital media: the era of deepfakes. IEEE Trans Technol Soc 1(3):138–147. https://doi.org/10.1109/TTS.2020.3001312
Article Google Scholar
Korshunov P, Marcel S (2018) Deepfakes: a new threat to face recognition? assessment and detection. CoRR arXiv:1812.08685
Lee S, Tariq S, Shin Y, Woo SS (2021) Detecting handcrafted facial image manipulations and gan-generated facial images using shallow-fakefacenet. Applied Soft Computing 105:107256. https://doi.org/10.1016/j.asoc.2021.107256, www.sciencedirect.com/science/article/pii/S1568494621001794
McCloskey S, Albright M (2019) Detecting gan-generated imagery using saturation cues. In: 2019 IEEE International Conference on Image Processing (ICIP), pp 4584–4588. https://doi.org/10.1109/ICIP.2019.8803661
Mirsky Y, Lee W (2021) The creation and detection of deepfakes: a survey. ACM Comput Surv 54(1). https://doi.org/10.1145/3425780, https://doi.org/10.1145/3425780
Nguyen HH, Yamagishi J, Echizen I (2019) Capsule-forensics: using capsule networks to detect forged images and videos. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 2307–2311. https://doi.org/10.1109/ICASSP.2019.8682602
Nguyen TT, Nguyen QVH, Nguyen DT, Nguyen DT, Huynh-The T, Nahavandi S, Nguyen TT, Pham Q-V, Nguyen CM (2022) Deep learning for deepfakes creation and detection: a survey. Comput Vis Image Underst 223:103525. https://doi.org/10.1016/j.cviu.2022.103525, www.sciencedirect.com/science/article/pii/S1077314222001114
Nguyen XH, Tran TS, Le VT, Nguyen KD, Truong D-T (2021) Learning spatio-temporal features to detect manipulated facial videos created by the deepfake techniques. Forensic Sci Int Digital Investigation 36:301108. https://doi.org/10.1016/j.fsidi.2021.301108, www.sciencedirect.com/science/article/pii/S2666281721000020
Perov I, Gao D, Chervoniy N, Liu K, Marangonda S, Umé C, Dpfks M, Facenheim CS, RP L, Jiang J, Zhang S, Wu P, Zhou B, Zhang W (2021) Deepfacelab: Integrated, flexible and extensible face-swapping framework. https://arxiv.org/abs/2005.05535
Rukundo O (2023) Effects of image size on deep learning. Electronics 12:985. https://doi.org/10.3390/electronics12040985
Article Google Scholar
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp 618–626. https://doi.org/10.1109/ICCV.2017.74
Taeb M, Chi H (2022) Comparison of deepfake detection techniques through deep learning. Journal of Cybersecurity and Privacy 2(1):89–106. https://doi.org/10.3390/jcp2010007, www.mdpi.com/2624-800X/2/1/7
Thompson NC, Greenewald K, Lee K, Manso GF (2020) The computational limits of deep learning. http://arxiv.org/abs/2007.05558, cite arxiv:2007.05558
Tolosana R, Vera-Rodriguez R, Fierrez J, Morales A, Ortega-Garcia J (2020) Deepfakes and beyond: a survey of face manipulation and fake detection. Information Fusion 64:131–148. https://doi.org/10.1016/j.inffus.2020.06.014, www.sciencedirect.com/science/article/pii/S1566253520303110
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Lu, Polosukhin I (2017) Attention is all you need. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in Neural Information Processing Systems. https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Verdoliva L (2020) Media forensics and deepfakes: an overview. IEEE J Sel Top Sign Proces 14(5):910–932. https://doi.org/10.1109/JSTSP.2020.3002101
Article Google Scholar
Xu Z, Liu J, Lu W, Xu B, Zhao X, Li B, Huang J (2021) Detecting facial manipulated videos based on set convolutional neural networks. J Vis Commun Image Represent 77:103119. https://doi.org/10.1016/j.jvcir.2021.103119, www.sciencedirect.com/science/article/pii/S1047320321000742
Xuan X, Peng B, Wang W, Dong J (2019) On the generalization of gan image forensics. In: Biometric Recognition: 14th Chinese Conference, CCBR 2019, Zhuzhou, China, October 12-13, 2019, Proceedings. Springer-Verlag, Berlin, Heidelberg, pp 134–141. https://doi.org/10.1007/978-3-030-31456-9_15
Yang J, Xiao S, Li A, Lan G, Wang H (2021) Detecting fake images by identifying potential texture difference. Futur Gener Comput Syst 125:127–135. https://doi.org/10.1016/j.future.2021.06.043, www.sciencedirect.com/science/article/pii/S0167739X21002387
Yang X, Li Y, Lyu S (2019) Exposing deep fakes using inconsistent head poses. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp 8261–8265. https://doi.org/10.1109/ICASSP.2019.8683164
Zhang Y, Zheng L, Thing VLL (2017) Automated face swapping and its detection. In: 2017 IEEE 2nd International Conference on Signal and Image Processing (ICSIP), pp 15–19. https://doi.org/10.1109/SIPROCESS.2017.8124497

Download references

Author information

Authors and Affiliations

Department of Information Technology, ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India
Shaheen Usmani & Sunil Kumar
Department of Computer Science and Engineering, ABV-Indian Institute of Information Technology and Management, Gwalior, Madhya Pradesh, India
Debanjan Sadhya

Authors

Shaheen Usmani
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Debanjan Sadhya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Debanjan Sadhya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Usmani, S., Kumar, S. & Sadhya, D. Efficient deepfake detection using shallow vision transformer. Multimed Tools Appl 83, 12339–12362 (2024). https://doi.org/10.1007/s11042-023-15910-z

Download citation

Received: 28 December 2022
Revised: 23 March 2023
Accepted: 22 May 2023
Published: 24 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15910-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient deepfake detection using shallow vision transformer

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

A comprehensive survey of AI-enabled phishing attacks detection techniques

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient deepfake detection using shallow vision transformer

Abstract

Access this article

Similar content being viewed by others

A survey on Image Data Augmentation for Deep Learning

CBAM: Convolutional Block Attention Module

A comprehensive survey of AI-enabled phishing attacks detection techniques

Data availability

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation