poster

Robust DeepFake Detection Method based on Ensemble of ViT and CNN

Authors:

Sangjun LeeAuthors Info & Claims

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

Pages 1092 - 1095

https://doi.org/10.1145/3555776.3577769

Published: 07 June 2023 Publication History

Abstract

With the development of convolutional neural networks (CNN) and generative adversarial networks (GAN) in recent years, classifying fake videos produced through DeepFake has become a very difficult task. Most previous studies on DeepFake Detection were focused on finding DeepFake artifacts through CNN. DeepFake detection using CNN has high accuracy, but is vulnerable to noisy inputs such as side faces, shadowed faces, and low-quality images. In addition, although it has the advantage of being able to learn quickly through inductive bias, it tends to be overfitted to specific datasets, showing low accuracy in manipulated videos created with a different type of DeepFake from training datasets.

In this study, we propose the robust DeepFake detection method, which combines vision transformer(ViT) and CNN models. We found through the experiments that the ViT model was highly effective in processing side faces and low-quality videos. Our method where the ResNeSt269 model was combined with the DeiT model using a weighted majority voting ensemble(WMVE) approach had 97.66% accuracy, which outperformed the results of the existing DeepFake Detection Challenge(DFDC)'s state-of-the-art model, which achieved 96.78% accuracy. In addition, when benchmarking is performed on a dataset that is completely different from the training dataset, Our method has the robustness to new dataset, showing more than 10% higher accuracy than the CNN model due to the high generalization performance of ViT.

References

[1]

Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. 2017. Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET). 1--6.

[2]

Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge (DFDC) Dataset. arXiv:cs.CV/2006.07397

[3]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv:cs.CV/2010.11929

[4]

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Vol. 27.

Digital Library

[5]

Dong Huang and Fernando De La Torre. 2012. Facial Action Transfer with Personalized Bilinear Regression. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 144--158.

[6]

Johnson, Justin M., Khoshgoftaar, and Taghi M. 2019. Survey on deep learning with class imbalance. Journal of Big Data 6, 1 (19 Mar 2019), 27.

[7]

Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]

L.I. Kuncheva. 2014. Combining Pattern Classifiers: Methods and Algorithms. Wiley.

Digital Library

[9]

Yisroel Mirsky and Wenke Lee. 2021. The Creation and Detection of Deepfakes: A Survey. ACM Comput. Surv. 54, 1, Article 7 (Jan. 2021), 41 pages.

Digital Library

[10]

Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject Agnostic Face Swapping and Reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).

[11]

Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359.

Digital Library

[12]

Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.

[13]

Selim Seferbekov. 2020. dfdc deepfake challenge. Github. https://github.com/selimsef/dfdc_deepfake_challenge. Accessed 16 Jan 2022.

[14]

Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 6, 1 (06 Jul 2019), 60.

[15]

Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6105--6114.

[16]

Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A Survey of face manipulation and fake detection. Information Fusion 64 (2020), 131--148.

[17]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 10347--10357.

[18]

Mika Westerlund. 2019. The Emergence of Deepfake Technology: A Review. Technology Innovation Management Review 9 (11/2019 2019), 40--53.

[19]

Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexander Smola. 2020. ResNeSt: Split-Attention Networks. arXiv:cs.CV/2004.08955

[20]

Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.

Cited By

Bhandari MShrestha SKarki UAdhikari SGaihre R(2024)Predicting manipulated regions in deepfake videos using convolutional vision transformersComputing and Artificial Intelligence10.59400/cai.v2i2.14092:2(1409)Online publication date: 19-Jul-2024
https://doi.org/10.59400/cai.v2i2.1409
Kalemullah AP PV S(2024)Deepfake Classification For Human Faces using Custom CNN2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10672973(744-750)Online publication date: 8-Aug-2024
https://doi.org/10.1109/ICCPCT61902.2024.10672973

Index Terms

Robust DeepFake Detection Method based on Ensemble of ViT and CNN
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems

Recommendations

DeepFake detection algorithm based on improved vision transformer
Abstract
A DeepFake is a manipulated video made with generative deep learning technologies, such as generative adversarial networks or auto encoders that anyone can utilize. With the increase in DeepFakes, classifiers consisting of convolutional neural ...
Deepfake Detection Using CNN Trained on Eye Region
Advances and Trends in Artificial Intelligence. Theory and Practices in Artificial Intelligence
Abstract
In this work, we will develop a simple convolutional neural network to detect deepfakes in videos on a frame-by-frame level, focusing on the region around the eyes. Since deepfakes are increasingly being created using forms of CNN, it should be ...
Ore Image Classification Based on Improved CNN
Abstract
The identification of ore deposits is an important technical task in mining and excavation. However, conventional techniques are time-consuming and tedious. Therefore, data augmentation and transfer learning were used in this topic to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing

March 2023

1932 pages

ISBN:9781450395175

DOI:10.1145/3555776

Conference Chairs:
Jiman Hong
Soongsil University, South Korea
,
Maart Lanperne
Tallinn University, Estonia
,
Program Chairs:
Juw Won Park
University of Louisville, USA
,
Tomas Cerny
Baylor University, USA
,
Publication Chair:
Hossain Shahriar
Kennesaw State University, USA

Copyright © 2023 Owner/Author(s).

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2023

Check for updates

Author Tags

Qualifiers

Poster

Funding Sources

National Research Foundation of Korea
Institute for Information & communications Technology Planning & Evaluation

Conference

SAC '23

Sponsor:

SIGAPP

SAC '23: 38th ACM/SIGAPP Symposium on Applied Computing

March 27 - 31, 2023

Tallinn, Estonia

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Upcoming Conference

SAC '25

Sponsor:
sigapp

The 40th ACM/SIGAPP Symposium on Applied Computing

March 31 - April 4, 2025

Catania , Italy

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
154
Total Downloads

Downloads (Last 12 months)84
Downloads (Last 6 weeks)7

Reflects downloads up to 30 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Bhandari MShrestha SKarki UAdhikari SGaihre R(2024)Predicting manipulated regions in deepfake videos using convolutional vision transformersComputing and Artificial Intelligence10.59400/cai.v2i2.14092:2(1409)Online publication date: 19-Jul-2024
https://doi.org/10.59400/cai.v2i2.1409
Kalemullah AP PV S(2024)Deepfake Classification For Human Faces using Custom CNN2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10672973(744-750)Online publication date: 8-Aug-2024
https://doi.org/10.1109/ICCPCT61902.2024.10672973

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten