research-article

Video Forgery Detection Using Spatio-Temporal Dual Transformer

Authors:

Chenyu Liu,

Jia Li,

Junxian Duan,

Huaibo HuangAuthors Info & Claims

ICCPR '22: Proceedings of the 2022 11th International Conference on Computing and Pattern Recognition

Pages 273 - 281

https://doi.org/10.1145/3581807.3581847

Published: 22 May 2023 Publication History

Get Access

Abstract

The fake videos generated by deep generation technology pose a potential threat to social stability, which makes it critical to detect fake videos. Although the previous detection methods have achieved high accuracy, the generalization to different datasets and in realistic scenes is not effective. We find several novel temporal and spatial clues. In the frequency domain, the inter-frame differences between the real and fake videos are significantly more obvious than the intra-frame differences. In the shallow texture on the CbCr color channels, the forged areas of the fake videos appear more distinct blurring compared to the real videos. And the optical flow of the real video changes gradually, while the optical flow of the fake video changes drastically. This paper proposes a spatio-temporal dual Transformer network for video forgery detection that integrates spatio-temporal clues with the temporal consistency of consecutive frames to improve generalization. Specifically, an EfficientNet is first used to extract spatial artifacts of shallow textures and high-frequency information. We add a new loss function to EfficientNet to extract more robust face features, as well as introduce an attention mechanism to enhance the extracted features. Next, a Swin Transformer is used to capture the subtle temporal artifacts in inter-frame spectrum difference and the optical flow. A feature interaction module is added to fuse local features and global representations. Finally, another Swin Transformer is used to classify the videos according to the extracted spatio-temporal features. We evaluate our method on datasets such as FaceForensics++, Celeb-DF (v2) and DFDC. Extensive experiments show that the proposed framework has high accuracy and generalization, outperforming the current state-of-the-art methods.

References

[1]

D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen. Mesonet: a compact facial video forgery detection network. 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pages 1–7, 2018.

Abstract

References

Cited By

Index Terms

Recommendations

Micro-expression Recognition Based on Dual-Branch Swin Transformer Network

Forgery Detection in Ballistic Motion Videos Using Motion Estimation and Modelling

Inter-frame forgery detection in H.264 videos using motion and brightness gradients

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

HTML Format

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations