research-article

Spatio-Temporal Catcher: A Self-Supervised Transformer for Deepfake Video Detection

Authors:

Kun Yu,

Hui Xue,

Minghao LiAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 8707 - 8718

https://doi.org/10.1145/3581783.3613842

Published: 27 October 2023 Publication History

Get Access

Abstract

As deepfake technology has become increasingly sophisticated and accessible, making it easier for individuals with malicious intent to create convincing fake content, which has raised considerable concern in the multimedia and computer vision community. Despite significant advances in deepfake video detection, most existing methods mainly focused on model architecture and training processes with little focus on data perspectives. In this paper, we argue that data quality has become the main bottleneck of current research. To be specific, in the pre-training phase, the domain shift between pre-training and target datasets may lead to poor generalization ability. Meanwhile, in the training phase, the low fidelity of the existing datasets leads to detectors relying on specific low-level visual artifacts or inconsistency. To overcome the shortcomings, (1). In the pre-training phase, pre-train our model on high-quality facial videos by utilizing data-efficient reconstruction-based self-supervised learning to solve domain shift. (2). In the training phase, we develop a novel spatio-temporal generator that can synthesize various high-quality "fake" videos in large quantities at a low cost, which enables our model to learn more general spatio-temporal representations in a self-supervised manner. (3). Additinally, to take full advantage of synthetic "fake" videos, we adopt diversity losses at both frame and video levels to explore the diversity of clues in "fake" videos. Our proposed framework is data-efficient and does not require any real-world deepfake videos. Extensive experiments demonstrate that our method significantly improves the generalization capability. Particularly on the most challenging CDF and DFDC datasets, our method outperforms the baselines by 8.88% and 7.73% points, respectively.

References

[1]

[n. d.]. Contributing data to deepfake detection research. https://ai.googleblog. com/2019/09/contributing-data-to-deepfake-detection.html. Accessed: 2021-11-13.

Abstract

References

Cited By

Index Terms

Recommendations

A debiased self-training framework with graph self-supervised pre-training aided for semi-supervised rumor detection

Adversarial Self-supervised Learning for Semi-supervised 3D Action Recognition

Weakly supervised activity analysis with spatio-temporal localisation

Comments

Information

Published In

Sponsors

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations