skip to main content
10.1145/3469877.3490586acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning

Published: 10 January 2022 Publication History

Abstract

Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.

References

[1]
D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen. 2018. MesoNet: A compact facial video forgery detection network. In BTAS. 1–7.
[2]
R. Arandjelovic, P. Gronat, A. Torii, T. Pajdla, and J. Sivic. 2016. NetVLAD: CNN architecture for weakly supervised place recognition. In CVPR. 5297–5307.
[3]
N. Ballas, L. Yao, C. Pal, and A. Courville. 2016. Delving deeper into convolutional networks for learning video representations. In ICLR.
[4]
C. Bregler, M. Covell, and M. Slaney. 1997. Video rewrite: Driving visual speech with audio. In SIGGRAPH. 353–360.
[5]
M. Buckland and F. Gey. 1994. The relationship between recall and precision. Journal of The American Society for Information Science 45, 1(1994), 12–19.
[6]
S. Das, S. Seferbekov, A. Datta, M. Islam, and M Amin. 2021. Towards solving the deepfake problem : An analysis on improving deepfake detection using dynamic face augmentation. arXiv:2102.09603 (2021).
[7]
E. Denton and V. Birodkar. 2017. Unsupervised learning of disentangled representations from video. In NeurIPS, Vol. 30. 1–10.
[8]
B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. Canton-Ferrer. 2019. The deepfake detection challenge (DFDC) preview dataset. arXiv: 1910.08854 (2019).
[9]
J. Franceschi, E. Delasalles, M. Chen, S. Lamprier, and P. Gallinari. 2020. Stochastic latent residual video prediction. In ICML. 3233–3246.
[10]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2014. Generative adversarial nets. In NeurIPS. 2672–2680.
[11]
D. Güera and E. Delp. 2018. Deepfake video detection using recurrent neural networks. In AVSS. 1–6.
[12]
A. Haliassos, K. Vougioukas, S. Petridis, and M. Pantic. 2021. Lips don’t lie: A generalisable and robust approach to face forgery detection. In CVPR. 5039–5049.
[13]
T. Han, W. Xie, and A. Zisserman. 2019. Video representation learning by dense predictive coding. In ICCVW. 1–10.
[14]
T. Han, W. Xie, and A. Zisserman. 2020. Memory-Augmented Dense Predictive Coding for Video Representation Learning. In ECCV. 312–329.
[15]
J. Hernandez-Ortega, R. Tolosana, J. Fiérrez, and A. Morales. 2021. DeepFakesON-Phys: Deepfakes detection based on heart rate estimation. In AAAIW, Vol. 2808.
[16]
S. Khan, A. Artusi, and H. Dai. 2021. Adversarially robust deepfake media detection using fused convolutional neural network predictions. arXiv:2102.05950 (2021).
[17]
M. Kim, S. Tariq, and S. Woo. 2021. FReTAL: Generalizing deepfake detection using knowledge distillation and representation learning. In CVPRW. 1001–1012.
[18]
D. Kingma and J. Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
[19]
L. Li, J. Bao, H. Yang, D. Chen, and F. Wen. 2020. Advancing high fidelity identity swapping for forgery detection. In CVPR. 5074–5083.
[20]
Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu. 2019. Celeb-DF (v2): A new dataset for deepfake forensics. arXiv:1909.12962 (2019).
[21]
R Liang, T. Li, L. Li, J. Wang, and Q. Zhang. 2020. Knowledge consistency between neural networks and beyond. In ICLR.
[22]
H. Liu, X. Li, W. Zhou, Y. Chen, Y. He, H. Xue, W. Zhang, and N. Yu. 2021. Spatial-phase shallow learning: rethinking face forgery detection in frequency domain. In CVPR. 772–781.
[23]
W Lotter, G. Kreiman, and D. Cox. 2017. Deep predictive coding networks for video prediction and unsupervised learning. In ICLR.
[24]
Y. Luo, Y. Zhang, J. Yan, and W. Liu. 2021. Generalizing face forgery detection with high-frequency features. In CVPR. 16317–16326.
[25]
F. Matern, C. Riess, and M. Stamminger. 2019. Exploiting visual artifacts to expose deepfakes and face manipulations. In WACVW. 83–92.
[26]
T. Mittal, U. Bhattacharya, R. Chandra, A. Bera, and D. Manocha. 2020. Emotions don’t lie: An audio-visual deepfake detection method using affective cues. In ACM Multimedia. 2823–2832.
[27]
H. Nguyen, F. Fang, J. Yamagishi, and I. Echizen. 2019. Multi-task learning for detecting and segmenting manipulated facial images and videos. In International Conference on Biometrics Theory, Applications and Systems. 1–8.
[28]
O. M. Parkhi, A. Vedaldi, and A. Zisserman. 2015. Deep face recognition. In BMVC. Article 41, 12 pages.
[29]
A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner. 2019. FaceForensics++: Learning to detect manipulated facial images. In ICCV. 1–11.
[30]
E. Sabir, J. Cheng, A. Jaiswal, W. Abd, I. Masi, and P. Natarajan. 2019. Recurrent convolutional strategies for face manipulation detection in videos. In ICCVW. 80–87.
[31]
F. Schroff, D. Kalenichenko, and J. Philbin. 2015. FaceNet: A unified embedding for face recognition and clustering. In CVPR. 815–823.
[32]
C. Sun, F. Baradel, K. Murphy, and C. Schmid. 2019. Learning video representations using contrastive bidirectional transformer. arXiv:1906.05743 (2019).
[33]
Z. Sun, Y. Han, Z. Hua, N. Ruan, and W. Jia. 2021. Improving the efficiency and robustness of deepfakes detection through precise geometric features. In CVPR. 3609–3618.
[34]
J. Thies, M. Zollhöfer, and M. Nießner. 2019. Deferred neural rendering: Image synthesis using neural textures. ACM Trans. Graph. 38, 4 (2019), 1–12.
[35]
J. Thies, M. Zollhöfer, M. Nießner, L. Valgaerts, M. Stamminger, and C. Theobalt. 2015. Real-time expression transfer for facial reenactment. ACM Trans. Graph. 34, 6 (2015), 183:1–183:14.
[36]
J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, and M. Nießner. 2016. Face2face: Real-time face capture and reenactment of rgb videos. In ICCV. 2387–2395.
[37]
R. Tolosana, R. Vera-Rodriguez, J. Fierrez, A. Morales, and J. Ortega-Garcia. 2020. Deepfakes and beyond: A survey of face manipulation and fake detection. Information Fusion 64(2020), 131–148.
[38]
S. Tulyakov, M. Liu, X. Yang, and J. Kautz. 2018. MoCoGAN: Decomposing motion and content for video generation. In ICCV. 1526–1535.
[39]
L. Van der Maaten and G. Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research 9, 11 (2008), 2579–2605.
[40]
J. Wang, Y. Liu, Y. Hu, H. Shi, and T. Mei. 2021. FaceX-Zoo: A pytorch toolbox for face recognition. arXiv:2101.04407 (2021).
[41]
Y. Wang and Antitza D.2020. A video is worth more than 1000 lies. Comparing 3DCNN approaches for detecting deepfakes. In FG. 515–519.
[42]
N. Wolchover and L. Reading. 2017. New theory cracks open the black box of deep learning. Quanta Magazine 3(2017).
[43]
C. Yeh, H. Chen, S. Tsai, and S. Wang. 2020. Disrupting image-translation-based deepfake algorithms with adversarial attacks. In WACVW. 53–62.
[44]
D. Zhang, C. Li, F. Lin, D. Zeng, and S. Ge. 2021. Detecting deepfake videos with temporal dropout 3DCNN. In IJCAI. 565–573.
[45]
H. Zhao, W. Zhou, D. Chen, T. Wei, W. Zhang, and N. Yu. 2021. Multi-attentional deepfake detection. In CVPR. 2185–2194.
[46]
X. Zhu, H. Wang, H. Fei, Z. Lei, and S. Li. 2021. Face forgery detection by 3d decomposition. In CVPR. 2929–2939.
[47]
B. Zi, M. Chang, J. Chen, X. Ma, and Y. Jiang. 2020. Wilddeepfake: A challenging real-world dataset for deepfake detection. In ACM Multimedia. 2382–2390.

Cited By

View all
  • (2024)Improving Sequential DeepFake Detection with Local information enhancementProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700276(1-1)Online publication date: 3-Dec-2024
  • (2024)Constructing New Backbone Networks via Space-Frequency Interactive Convolution for Deepfake DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332473919(401-413)Online publication date: 1-Jan-2024
  • (2024)The potential effects of deepfakes on news media and entertainmentAI & SOCIETY10.1007/s00146-024-02072-1Online publication date: 23-Oct-2024
  • Show More Cited By

Index Terms

  1. Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
          December 2021
          508 pages
          ISBN:9781450386074
          DOI:10.1145/3469877
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 10 January 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Deepfake video detection
          2. predictive representation learning
          3. self-supervised learning

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Funding Sources

          • National Key Research and Development Plan

          Conference

          MMAsia '21
          Sponsor:
          MMAsia '21: ACM Multimedia Asia
          December 1 - 3, 2021
          Gold Coast, Australia

          Acceptance Rates

          Overall Acceptance Rate 59 of 204 submissions, 29%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)45
          • Downloads (Last 6 weeks)3
          Reflects downloads up to 16 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Improving Sequential DeepFake Detection with Local information enhancementProceedings of the 6th ACM International Conference on Multimedia in Asia10.1145/3696409.3700276(1-1)Online publication date: 3-Dec-2024
          • (2024)Constructing New Backbone Networks via Space-Frequency Interactive Convolution for Deepfake DetectionIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.332473919(401-413)Online publication date: 1-Jan-2024
          • (2024)The potential effects of deepfakes on news media and entertainmentAI & SOCIETY10.1007/s00146-024-02072-1Online publication date: 23-Oct-2024
          • (2024)Avatar Fingerprinting for Authorized Use of Synthetic Talking-Head VideosComputer Vision – ECCV 202410.1007/978-3-031-72633-0_12(209-228)Online publication date: 22-Nov-2024
          • (2023)The Impact of Blockchain Technology to Protect Image and Video Integrity from Identity Theft using Deepfake Analyzer2023 International Conference on Innovative Data Communication Technologies and Application (ICIDCA)10.1109/ICIDCA56705.2023.10099668(730-733)Online publication date: 14-Mar-2023
          • (2023)Fine-grained deepfake detection based on cross-modality attentionNeural Computing and Applications10.1007/s00521-023-08271-z35:15(10861-10874)Online publication date: 31-Jan-2023
          • (2022)Deepfake Video Detection via Predictive Representation LearningACM Transactions on Multimedia Computing, Communications, and Applications10.1145/353642618:2s(1-21)Online publication date: 6-Oct-2022

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media