research-article

Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning

Authors:

Dan ZengAuthors Info & Claims

MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

Article No.: 6, Pages 1 - 7

https://doi.org/10.1145/3469877.3490586

Published: 10 January 2022 Publication History

Abstract

Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.

References

[1]

D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen. 2018. MesoNet: A compact facial video forgery detection network. In BTAS. 1–7.

[2]

R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR. 5297–5307.

[3]

N. Ballas, L. Yao, C. Pal, and A. Courville. 2016. Delving deeper into convolutional networks for learning video representations. In ICLR.

[4]

C. Bregler, M. Covell, and M. Slaney. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH. 353–360.

[5]

M. Buckland and F. Gey. 1994. The relationship between recall and precision. Journal of The American Society for Information Science 45, 1(1994), 12–19.

[6]

S. Das, S. Seferbekov, A. Datta, M. Islam, and M Amin. 2021. Towards solving the deepfake problem : An analysis on improving deepfake detection using dynamic face augmentation. arXiv:2102.09603 (2021).

[7]

E. Denton and V. Birodkar. 2017. Unsupervised learning of disentangled representations from video. In NeurIPS, Vol. 30. 1–10.

[8]

B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. Canton-Ferrer. 2019. The deepfake detection challenge (DFDC) preview dataset. arXiv: 1910.08854 (2019).

[9]

J. Franceschi, E. Delasalles, M. Chen, S. Lamprier, and P. Gallinari. 2020. Stochastic latent residual video prediction. In ICML. 3233–3246.

[10]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672–2680.

[11]

D. Güera and E. Delp. 2018. Deepfake video detection using recurrent neural networks. In AVSS. 1–6.

[12]

A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic. 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. In CVPR. 5039–5049.

[13]

T. Han, W. Xie, and A. Zisserman. 2019. Video representation learning by dense predictive coding. In ICCVW. 1–10.

[14]

T. Han, W. Xie, and A. Zisserman. 2020. Memory-Augmented Dense Predictive Coding for Video Representation Learning. In ECCV. 312–329.

[15]

J. Hernandez-Ortega, R. Tolosana, J. Fiérrez, and A. Morales. 2021. DeepFakesON-Phys: Deepfakes detection based on heart rate estimation. In AAAIW, Vol. 2808.

[16]

S. Khan, A. Artusi, and H. Dai. 2021. Adversarially robust deepfake media detection using fused convolutional neural network predictions. arXiv:2102.05950 (2021).

[17]

M. Kim, S. Tariq, and S. Woo. 2021. FReTAL: Generalizing deepfake detection using knowledge distillation and representation learning. In CVPRW. 1001–1012.

[18]

D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR.

[19]

L. Li, J. Bao, H. Yang, D. Chen, and F. Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In CVPR. 5074–5083.

[20]

Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu. 2019. Celeb-DF (v2): A new dataset for deepfake forensics. arXiv:1909.12962 (2019).

[21]

R Liang, T. Li, L. Li, J. Wang, and Q. Zhang. 2020. Knowledge consistency between neural networks and beyond. In ICLR.

[22]

H. Liu, X. Li, W. Zhou, Y. Chen, Y. He, H. Xue, W. Zhang, and N. Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In CVPR. 772–781.

[23]

W Lotter, G. Kreiman, and D. Cox. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In ICLR.

[24]

Y. Luo, Y. Zhang, J. Yan, and W. Liu. 2021. Generalizing face forgery detection with high-frequency features. In CVPR. 16317–16326.

[25]

F. Matern, C. Riess, and M. Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In WACVW. 83–92.

[26]

T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha. 2020. Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In ACM Multimedia. 2823–2832.

[27]

H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In International Conference on Biometrics Theory, Applications and Systems. 1–8.

[28]

O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep face recognition. In BMVC. Article 41, 12 pages.

[29]

A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In ICCV. 1–11.

[30]

E. Sabir, J. Cheng, A. Jaiswal, W. Abd, I. Masi, and P. Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. In ICCVW. 80–87.

[31]

F. Schroff, D. Kalenichenko, and J. Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In CVPR. 815–823.

[32]

C. Sun, F. Baradel, K. Murphy, and C. Schmid. 2019. Learning video representations using contrastive bidirectional transformer. arXiv:1906.05743 (2019).

[33]

Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia. 2021. Improving the efficiency and robustness of deepfakes detection through precise geometric features. In CVPR. 3609–3618.

[34]

J. Thies, M. Zollhöfer, and M. Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4 (2019), 1–12.

Digital Library

[35]

J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183:1–183:14.

Digital Library

[36]

J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In ICCV. 2387–2395.

[37]

R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64(2020), 131–148.

[38]

S. Tulyakov, M. Liu, X. Yang, and J. Kautz. 2018. MoCoGAN: Decomposing motion and content for video generation. In ICCV. 1526–1535.

[39]

L. Van der Maaten and G. Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.

[40]

J. Wang, Y. Liu, Y. Hu, H. Shi, and T. Mei. 2021. FaceX-Zoo: A pytorch toolbox for face recognition. arXiv:2101.04407 (2021).

[41]

Y. Wang and Antitza D.2020. A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes. In FG. 515–519.

[42]

N. Wolchover and L. Reading. 2017. New theory cracks open the black box of deep learning. Quanta Magazine 3(2017).

[43]

C. Yeh, H. Chen, S. Tsai, and S. Wang. 2020. Disrupting image-translation-based deepfake algorithms with adversarial attacks. In WACVW. 53–62.

[44]

D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge. 2021. Detecting deepfake videos with temporal dropout 3DCNN. In IJCAI. 565–573.

[45]

H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu. 2021. Multi-attentional deepfake detection. In CVPR. 2185–2194.

[46]

X. Zhu, H. Wang, H. Fei, Z. Lei, and S. Li. 2021. Face forgery detection by 3d decomposition. In CVPR. 2929–2939.

[47]

B. Zi, M. Chang, J. Chen, X. Ma, and Y. Jiang. 2020. Wilddeepfake: A challenging real-world dataset for deepfake detection. In ACM Multimedia. 2382–2390.

Digital Library

Cited By

Dong LXu YZhong JQi ZZhang W(2024)Improving Sequential DeepFake Detection with Local information enhancementProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700276(1-1)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700276
Guo ZJia ZWang LWang DYang GKasabov N(2024)Constructing New Backbone Networks via Space-Frequency Interactive Convolution for Deepfake DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332473919(401-413)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3324739
Lundberg EMozelius P(2024)The potential effects of deepfakes on news media and entertainmentAI & SOCIETY10.1007/s00146-024-02072-1Online publication date: 23-Oct-2024
https://doi.org/10.1007/s00146-024-02072-1
Show More Cited By

Index Terms

Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Activity recognition and understanding
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Deepfake Video Detection via Predictive Representation Learning
Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting ...
Latent Dirichlet learning for document summarization
ICASSP '09: Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing

Automatic summarization is developed to extract the representative contents or sentences from a large corpus of documents. This paper presents a new hierarchical representation of words, sentences and documents in a corpus, and infers the Dirichlet ...
SAFE: Sequential Attentive Face Embedding with Contrastive Learning for Deepfake Video Detection
CIKM '23: Proceedings of the 32nd ACM International Conference on Information and Knowledge Management

The emergence of hyper-realistic deepfake videos has raised significant concerns regarding their potential misuse. However, prior research on deepfake detection has primarily focused on image-based approaches, with little emphasis on video. With the ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

December 2021

508 pages

ISBN:9781450386074

DOI:10.1145/3469877

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 January 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

National Key Research and Development Plan

Conference

MMAsia '21

Sponsor:

SIGMM

MMAsia '21: ACM Multimedia Asia

December 1 - 3, 2021

Gold Coast, Australia

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
265
Total Downloads

Downloads (Last 12 months)45
Downloads (Last 6 weeks)3

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Dong LXu YZhong JQi ZZhang W(2024)Improving Sequential DeepFake Detection with Local information enhancementProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700276(1-1)Online publication date: 3-Dec-2024
https://dl.acm.org/doi/10.1145/3696409.3700276
Guo ZJia ZWang LWang DYang GKasabov N(2024)Constructing New Backbone Networks via Space-Frequency Interactive Convolution for Deepfake DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332473919(401-413)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.1109/TIFS.2023.3324739
Lundberg EMozelius P(2024)The potential effects of deepfakes on news media and entertainmentAI & SOCIETY10.1007/s00146-024-02072-1Online publication date: 23-Oct-2024
https://doi.org/10.1007/s00146-024-02072-1
Prashnani ENagano KDe Mello SLuebke DGallo O(2024)Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head VideosComputer Vision – ECCV 202410.1007/978-3-031-72633-0_12(209-228)Online publication date: 22-Nov-2024
https://doi.org/10.1007/978-3-031-72633-0_12
Costales JShiromani SDevaraj M(2023)The Impact of Blockchain Technology to Protect Image and Video Integrity from Identity Theft using Deepfake Analyzer2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10099668(730-733)Online publication date: 14-Mar-2023
https://doi.org/10.1109/ICIDCA56705.2023.10099668
Zhao LZhang MDing HCui X(2023)Fine-grained deepfake detection based on cross-modality attentionNeural Computing and Applications10.1007/s00521-023-08271-z35:15(10861-10874)Online publication date: 31-Jan-2023
https://dl.acm.org/doi/10.1007/s00521-023-08271-z
Ge SLin FLi CZhang DWang WZeng D(2022)Deepfake Video Detection via Predictive Representation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353642618:2s(1-21)Online publication date: 6-Oct-2022
https://dl.acm.org/doi/10.1145/3536426

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten