skip to main content
research-article

Hierarchical Learning and Dummy Triplet Loss for Efficient Deepfake Detection

Published: 09 December 2023 Publication History

Abstract

The advancement of generative models has made it easier to create highly realistic Deepfake videos. This accessibility has led to a surge in research on Deepfake detection to mitigate potential misuse. Typically, Deepfake detection models utilize binary backbones, even though the training dataset contains additional exploitable information, such as the Deepfake generation method employed for each video. However, recent findings suggest that inferring a binary class from a multi-class backbone yields superior performance compared to directly employing a binary backbone. Building upon this research, our article introduces two novel methods to infer a binary class from a multi-class backbone. The first method, named root dummies, leverages the dummy triplet loss, which employs fixed vectors (i.e., dummies) instead of mined positives and negatives in the triplet loss. By training the multi-class backbone with these dummies, we can easily infer a binary class during testing by adjusting the number of dummies (from six during training to two during inference). Through this approach, we achieve an accuracy improvement of 0.23% compared to the existing inference method, without requiring additional training. The second proposed method is transfer learning. It involves training a classifier, such as a support vector machine, to predict binary classes based on the image embeddings generated by the multi-class backbone. Although this method necessitates additional training, it further enhances the model’s performance, resulting in an accuracy increase of 1.79%. In summary, our proposed methods improve the accuracy of Deepfake detection by simply modifying the number of classes during training, making them suitable for integration into a variety of existing Deepfake training pipelines. Additionally, to foster reproducible research, we have made the source code of our solution publicly available at https://github.com/beuve/DmyT.

References

[1]
Darius Afchar, Vincent Nozick, Junichi Yamagishi, and Isao Echizen. 2018. MesoNet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS ’18). 1–7.
[2]
Shruti Agarwal, Tarek El-Gaaly, Hany Farid, and Ser-Nam Lim. 2020. Detecting deep-fake videos from appearance and behavior. arxiv:2004.14491 [cs.CV] (2020).
[3]
Shruti Agarwal and Hany Farid. 2021. Detecting deep-fake videos from aural and oral dynamics. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’21).981–989.
[4]
Shruti Agarwal and Hany Farid. 2021. Detecting deep-fake videos from aural and oral dynamics. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’21). 981–989.
[5]
Shruti Agarwal, Hany Farid, Ohad Fried, and Maneesh Agrawala. 2020. Detecting deep-fake videos from phoneme-viseme mismatches. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’20). 2814–2822.
[6]
Nicolas Beuve, Wassim Hamidouche, and Olivier Deforges. 2021. DmyT: Dummy triplet loss for deepfake detection. In Proceedings of the 1st Workshop on Synthetic Multimedia: Audiovisual Deepfake Generation and Detection. ACM, New York, NY, 17–24.
[7]
Volker Blanz and Thomas Vetter. 1999. A morphable model for the synthesis of 3D faces. In Proceedings of the 26th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH ’99). ACM, New York, NY, 187–194.
[8]
François Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’17).1800–1807.
[9]
Davide Cozzolino, Giovanni Poggi, and Luisa Verdoliva. 2019. Extracting camera-based fingerprints for video forensics. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’19). 130–137.
[10]
Davide Cozzolino, Andreas Rössler, Justus Thies, Matthias Nießner, and Luisa Verdoliva. 2021. ID-Reveal: Identity-aware DeepFake video detection. In Proceedings of the 2021 International Conference on Computer Vision.
[11]
Ilke Demir and Umur Aybars Ciftci. 2021. Where do deep fakes look? Synthetic face detection via gaze tracking. In Proceedings of the ACM Symposium on Eye Tracking Research and Applications.
[12]
Jiankang Deng, Jia Guo, Evangelos Ververas, Irene Kotsia, and Stefanos Zafeiriou. 2020. RetinaFace: Single-shot multi-level face localisation in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20). 5203–5212.
[13]
Thanh-Toan Do, Toan Tran, Ian Reid, Vijay Kumar, Tuan Hoang, and Gustavo Carneiro. 2019. A theoretically sound upper bound on the triplet loss for improving the efficiency of deep distance metric learning. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’19). IEEE, Los Alamitos, CA, 10396–10405.
[14]
Brian Dolhansky, Joanna Bitton, Ben Pflaum, Jikuo Lu, Russ Howes, Menglin Wang, and Cristian Canton Ferrer. 2020. The DeepFake Detection Challenge Dataset. arxiv:2006.07397 [cs.CV] (2020).
[15]
Xiaoyi Dong, Jianmin Bao, Dongdong Chen, Weiming Zhang, Nenghai Yu, Dong Chen, Fang Wen, and Baining Guo. 2020. Identity-driven DeepFake detection. arXiv abs/2012.03930 (2020).
[16]
Ricard Durall, Margret Keuper, Franz-Josef Pfreundt, and Janis Keuper. 2019. Unmasking DeepFakes with simple features. arXiv abs/1911.00686 (2019).
[17]
S. Fernandes, S. Raj, E. Ortiz, I. Vintila, M. Saflter, G. Urosevic, and S. Jha. 2019. Predicting heart rate variations of deepfake videos using neural ODE. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW ’19). 1721–1729.
[18]
Oliver Giudice, Luca Guarnera, and Sebastiano Battiato. 2021. Fighting deepfakes by detecting GAN DCT anomalies. Journal of Imaging 7 (2021), 128.
[19]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems, Z. Ghahramani, M. Welling, C. Cortes, N. Lawrence, and K. Q. Weinberger (Eds.), Vol. 27. Curran Associates, 1–9.
[20]
Google AI Blog. 2019. Contributing data to deepfake detection research. Google Research. Retrieved October 12, 2023 from https://blog.research.google/2019/09/contributing-data-to-deepfake-detection.html?m=1
[21]
Luca Guarnera, Oliver Giudice, and Sebastiano Battiato. 2020. DeepFake detection by analyzing convolutional traces. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW ’20).2841–2850.
[22]
Young-Jin Heo, Young-Ju Choi, Young-Woon Lee, and Byung-Gyu Kim. 2021. Deepfake detection scheme based on vision transformer and distillation. arXiv abs/2104.01353 (2021).
[23]
J. Hernandez-Ortega, Rubén Tolosana, Julian Fierrez, and Aythami Morales. 2021. DeepFakesON-phys: DeepFakes detection based on heart rate estimation. arXiv abs/2010.00400 (2021).
[24]
Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv abs/1704.04861 (2017).
[25]
Anubhav Jain, Pavel Korshunov, and Sébastien Marcel. 2021. Improving generalization of deepfake detection by training for attribution. In Proceedings of the 2021 IEEE 23rd International Workshop on Multimedia Signal Processing (MMSP ’21). 1–6.
[26]
Liming Jiang, Ren Li, Wayne Wu, Chen Qian, and Chen Change Loy. 2020. DeeperForensics-1.0: A large-scale dataset for real-world face forgery detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20).
[27]
T. Jung, S. Kim, and K. Kim. 2020. DeepVision: Deepfakes detection using human eye blinking pattern. IEEE Access 8 (2020), 83144–83154.
[28]
Pavel Korshunov, Anubhav Jain, and Sébastien Marcel. 2022. Custom attribution loss for improving generalization and interpretability of deepfake detection. In Proceedings of the 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’22). 8972–8976.
[29]
Lingzhi Li, Jianmin Bao, Hao Yang, Dong Chen, and Fang Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5074–5083.
[30]
Yuezun Li, Ming-Ching Chang, and Siwei Lyu. 2018. In ictu oculi: Exposing AI created fake videos by detecting eye blinking. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS ’18). 1–7.
[31]
Yuezun Li and Siwei Lyu. 2019. Exposing DeepFake videos by detecting face warping artifacts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPR2 ’19). 46–52.
[32]
Yuezun Li, Xin Yang, Pu Sun, Honggang Qi, and Siwei Lyu. 2020. Celeb-DF: A large-scale challenging dataset for DeepFake forensics. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’20). 3204–3213.
[33]
J. Naruniec, L. Helminger, C. Schroers, and R. M. Weber. 2020. High-resolution neural face swapping for visual effects. Computer Graphics Forum 39, 4 (2020), 173–184.
[34]
Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Capsule-forensics: Using capsule networks to detect forged images and videos. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’19). 2307–2311.
[35]
Huy Hoang Nguyen, Junichi Yamagishi, and Isao Echizen. 2019. Use of a capsule network to detect fake images and videos. arXiv abs/1910.12467 (2019).
[36]
NVIDIA. 2020. NVIDIA Maxine. Retrieved October 12, 2023 from https://developer.nvidia.com/maxine
[37]
Ivan Petrov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Mr. Dpfks, Luis RP, Jian Jiang, Sheng Zhang, Pingyu Wu, Bo Zhou, and Weiming Zhang. 2020. DeepFaceLab: A simple, flexible and extensible face swapping framework. arXiv abs/2005.05535 (2020).
[38]
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. 2020. Thinking in frequency: Face forgery detection by mining frequency-aware clues. In Proceedings of the European Conference on Computer Vision (ECCV ’20). 86–102.
[39]
Andreas Rossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Niessner. 2019. FaceForensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’19). 1–11.
[40]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’15). 815–823.
[41]
S. Shyam Sundar, Maria D. Molina, and Eugene Cho. 2021. Seeing is believing: Is video modality more powerful in spreading fake news via online messaging apps? Journal of Computer-Mediated Communication 26, 6 (2021), 301–319.
[42]
Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, Kamalika Chaudhuri and Ruslan Salakhutdinov (Eds.). Proceedings of Machine Learning Research, Vol. 97. PMLR, 6105–6114.
[43]
Shahroz Tariq, Sangyup Lee, and Simon S. Woo. 2020. A convolutional LSTM based residual network for deepfake video detection. arxiv:2009.07480 [cs.CV] (2020).
[44]
Justus Thies, Michael Zollhöfer, and Matthias Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics 38, 4 (July 2019), Article 66, 12 pages.
[45]
Justus Thies, Michael Zollhöfer, Marc Stamminger, Christian Theobalt, and Matthias Nießner. 2016. Face2Face: Real-time face capture and reenactment of RGB videos. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’16).2387–2395.
[46]
Junke Wang, Zuxuan Wu, Jingjing Chen, and Yu-Gang Jiang. 2021. M2TR: Multi-modal multi-scale transformers for deepfake detection. arxiv:2104.09770[cs.CV] (2021).
[47]
Yuhan Wang, Xu Chen, Junwei Zhu, Wenqing Chu, Ying Tai, Chengjie Wang, Jilin Li, Yongjian Wu, Feiyue Huang, and Rongrong Ji. 2021. HifiFace: 3D shape and semantic prior guided high fidelity face swapping. In Proceedings of the 30th International Joint Conference on Artificial Intelligence (IJCAI ’21). 1136–1142.
[48]
Yaohui Wang and Antitza Dantcheva. 2020. A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes. In Proceedings of the 2020 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG ’20).515–519.
[49]
C. Wu, R. Manmatha, A. J. Smola, and P. Krahenbuhl. 2017. Sampling matters in deep embedding learning. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV ’17). IEEE, Los Alamitos, CA, 2859–2867.
[50]
Wayne Wu, Yunxuan Zhang, Cheng Li, Chen Qian, and Chen Change Loy. 2018. ReenactGAN: Learning to reenact faces via boundary transfer. In Proceedings of the 16th European Conference on Computer Vision (ECCV ’20). 622–638.
[51]
Xin Yang, Yuezun Li, and Siwei Lyu. 2019. Exposing deep fakes using inconsistent head poses. In Proceedings of the 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP ’19).8261–8265.
[52]
Xu Zhang, Svebor Karaman, and Shih-Fu Chang. 2019. Detecting and simulating artifacts in GAN fake images. In Proceedings of the 2019 IEEE International Workshop on Information Forensics and Security (WIFS ’19).1–6.
[53]
Bojia Zi, Minghao Chang, Jingjing Chen, Xingjun Ma, and Yugang Jiang. 2020. WildDeepfake: A challenging real-world dataset for deepfake detection. In Proceedings of the 28th ACM International Conference on Multimedia.

Cited By

View all
  • (2025) Bi‐ LORA : A Vision‐Language Approach for Synthetic Image Detection Expert Systems10.1111/exsy.1382942:2Online publication date: 8-Jan-2025

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 3
March 2024
665 pages
EISSN:1551-6865
DOI:10.1145/3613614
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 December 2023
Online AM: 05 October 2023
Accepted: 18 September 2023
Revised: 18 July 2023
Received: 10 June 2022
Published in TOMM Volume 20, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Deepfake forensics
  2. neural networks
  3. metric learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)10
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025) Bi‐ LORA : A Vision‐Language Approach for Synthetic Image Detection Expert Systems10.1111/exsy.1382942:2Online publication date: 8-Jan-2025

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media