skip to main content
10.1145/3555776.3577769acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
poster

Robust DeepFake Detection Method based on Ensemble of ViT and CNN

Published: 07 June 2023 Publication History

Abstract

With the development of convolutional neural networks (CNN) and generative adversarial networks (GAN) in recent years, classifying fake videos produced through DeepFake has become a very difficult task. Most previous studies on DeepFake Detection were focused on finding DeepFake artifacts through CNN. DeepFake detection using CNN has high accuracy, but is vulnerable to noisy inputs such as side faces, shadowed faces, and low-quality images. In addition, although it has the advantage of being able to learn quickly through inductive bias, it tends to be overfitted to specific datasets, showing low accuracy in manipulated videos created with a different type of DeepFake from training datasets.
In this study, we propose the robust DeepFake detection method, which combines vision transformer(ViT) and CNN models. We found through the experiments that the ViT model was highly effective in processing side faces and low-quality videos. Our method where the ResNeSt269 model was combined with the DeiT model using a weighted majority voting ensemble(WMVE) approach had 97.66% accuracy, which outperformed the results of the existing DeepFake Detection Challenge(DFDC)'s state-of-the-art model, which achieved 96.78% accuracy. In addition, when benchmarking is performed on a dataset that is completely different from the training dataset, Our method has the robustness to new dataset, showing more than 10% higher accuracy than the CNN model due to the high generalization performance of ViT.

References

[1]
Saad Albawi, Tareq Abed Mohammed, and Saad Al-Zawi. 2017. Understanding of a convolutional neural network. In 2017 International Conference on Engineering and Technology (ICET). 1--6.
[2]
Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge (DFDC) Dataset. arXiv:cs.CV/2006.07397
[3]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale. arXiv:cs.CV/2010.11929
[4]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative Adversarial Nets. In Advances in Neural Information Processing Systems, Vol. 27.
[5]
Dong Huang and Fernando De La Torre. 2012. Facial Action Transfer with Personalized Bilinear Regression. In Computer Vision - ECCV 2012, Andrew Fitzgibbon, Svetlana Lazebnik, Pietro Perona, Yoichi Sato, and Cordelia Schmid (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 144--158.
[6]
Johnson, Justin M., Khoshgoftaar, and Taghi M. 2019. Survey on deep learning with class imbalance. Journal of Big Data 6, 1 (19 Mar 2019), 27.
[7]
Tero Karras, Samuli Laine, and Timo Aila. 2019. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8]
L.I. Kuncheva. 2014. Combining Pattern Classifiers: Methods and Algorithms. Wiley.
[9]
Yisroel Mirsky and Wenke Lee. 2021. The Creation and Detection of Deepfakes: A Survey. ACM Comput. Surv. 54, 1, Article 7 (Jan. 2021), 41 pages.
[10]
Yuval Nirkin, Yosi Keller, and Tal Hassner. 2019. FSGAN: Subject Agnostic Face Swapping and Reenactment. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV).
[11]
Sinno Jialin Pan and Qiang Yang. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345--1359.
[12]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner. 2019. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1--11.
[13]
Selim Seferbekov. 2020. dfdc deepfake challenge. Github. https://github.com/selimsef/dfdc_deepfake_challenge. Accessed 16 Jan 2022.
[14]
Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data Augmentation for Deep Learning. Journal of Big Data 6, 1 (06 Jul 2019), 60.
[15]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the 36th International Conference on Machine Learning (Proceedings of Machine Learning Research), Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.), Vol. 97. PMLR, 6105--6114.
[16]
Ruben Tolosana, Ruben Vera-Rodriguez, Julian Fierrez, Aythami Morales, and Javier Ortega-Garcia. 2020. Deepfakes and beyond: A Survey of face manipulation and fake detection. Information Fusion 64 (2020), 131--148.
[17]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research), Marina Meila and Tong Zhang (Eds.), Vol. 139. PMLR, 10347--10357.
[18]
Mika Westerlund. 2019. The Emergence of Deepfake Technology: A Review. Technology Innovation Management Review 9 (11/2019 2019), 40--53.
[19]
Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Haibin Lin, Zhi Zhang, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, and Alexander Smola. 2020. ResNeSt: Split-Attention Networks. arXiv:cs.CV/2004.08955
[20]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks. IEEE Signal Processing Letters 23, 10 (2016), 1499--1503.

Cited By

View all
  • (2024)Predicting manipulated regions in deepfake videos using convolutional vision transformersComputing and Artificial Intelligence10.59400/cai.v2i2.14092:2(1409)Online publication date: 19-Jul-2024
  • (2024)Deepfake Classification For Human Faces using Custom CNN2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10672973(744-750)Online publication date: 8-Aug-2024

Index Terms

  1. Robust DeepFake Detection Method based on Ensemble of ViT and CNN

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '23: Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing
    March 2023
    1932 pages
    ISBN:9781450395175
    DOI:10.1145/3555776
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s).

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 07 June 2023

    Check for updates

    Author Tags

    1. DeepFake detection
    2. video manipulation
    3. CNN
    4. vision transformer
    5. weighted voting ensemble

    Qualifiers

    • Poster

    Funding Sources

    Conference

    SAC '23
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Upcoming Conference

    SAC '25
    The 40th ACM/SIGAPP Symposium on Applied Computing
    March 31 - April 4, 2025
    Catania , Italy

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)84
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 30 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Predicting manipulated regions in deepfake videos using convolutional vision transformersComputing and Artificial Intelligence10.59400/cai.v2i2.14092:2(1409)Online publication date: 19-Jul-2024
    • (2024)Deepfake Classification For Human Faces using Custom CNN2024 7th International Conference on Circuit Power and Computing Technologies (ICCPCT)10.1109/ICCPCT61902.2024.10672973(744-750)Online publication date: 8-Aug-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media