Latent Pattern Sensing: Deepfake Video Detection via Predictive Representation Learning

Published: 10 January 2022 Publication History


Increasingly advanced deepfake approaches have made the detection of deepfake videos very challenging. We observe that the general deepfake videos often exhibit appearance-level temporal inconsistencies in some facial components between frames, resulting in discriminable spatiotemporal latent patterns among semantic-level feature maps. Inspired by this finding, we propose a predictive representative learning approach termed Latent Pattern Sensing to capture these semantic change characteristics for deepfake video detection. The approach cascades a CNN-based encoder, a ConvGRU-based aggregator and a single-layer binary classifier. The encoder and aggregator are pre-trained in a self-supervised manner to form the representative spatiotemporal context features. Finally, the classifier is trained to classify the context features, distinguishing fake videos from real ones. In this manner, the extracted features can simultaneously describe the latent patterns of videos across frames spatially and temporally in a unified way, leading to an effective deepfake video detector. Extensive experiments prove our approach’s effectiveness, e.g., surpassing 10 state-of-the-arts at least 7.92%@AUC on challenging Celeb-DF(v2) benchmark.


          Published In

          MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia
          December 2021
          508 pages
          Published: 10 January 2022


          Author Tags

          1. Deepfake video detection
          2. predictive representation learning
          3. self-supervised learning


          Funding Sources

          • National Key Research and Development Plan


          MMAsia '21
          MMAsia '21: ACM Multimedia Asia
          December 1 - 3, 2021
          Gold Coast, Australia

