Visual Explanations for Exposing Potential Inconsistency of Deepfakes

Pei, Pengfei; Zhao, Xianfeng; Cao, Yun; Hu, Chengqiao

doi:10.1007/978-3-031-25115-3_5

Pengfei Pei^11,12,
Xianfeng Zhao^11,12,
Yun Cao^11,12 &
…
Chengqiao Hu¹¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13825))

Included in the following conference series:

International Workshop on Digital Watermarking

428 Accesses
1 Citations

Abstract

In recent years, the rapid development of Deepfake has aroused public concerns. Existing Deepfake detection methods mainly focus on improving the accuracy. However, when real-world victims require additional interpretable results to refute, the accuracy of these methods is certainly insufficient. To mitigate this issue, we delve into forgery traces and propose a novel framework, named Find-X, that presents additional visual information as an explanation of the results. Specifically, we design a new module named Separation Potential Inconsistency (SPI) which aims to visually explain the forgery traces of fake videos. Find-x detection of Deepfake consists of three stages: (1) A frequency-aware module and a spatial-aware module to enhance the features. (2) A multi-scale feature extraction module to extract richer features. (3) A classification module and a SPI module to output the visual explanations. Our method outperforms state-of-the-art competitors on three popular benchmark datasets: FaceForensics++, Celeb-DF, and DeepFakeDetection. In addition, extensive visualization experiments on FaceForensics++ demonstrate that SPI can effectively separate the potentially inconsistent features of videos generated by five different Deepfake methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: a video vision transformer. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 6816–6826 (2021). https://doi.org/10.1109/ICCV48922.2021.00676
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Cozzolino, D., Rössler, A., Thies, J., Nießner, M., Verdoliva, L.: ID-reveal: identity-aware deepfake video detection. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 15088–15097 (2021). https://doi.org/10.1109/ICCV48922.2021.01483
Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. CoRR abs/2203.02751 (2022). https://doi.org/10.48550/arXiv.2203.02751
Dufour, N., Gully, A.: DeepFakeDetection dataset (2019). https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html
Fridrich, J.J., Kodovský, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012). https://doi.org/10.1109/TIFS.2012.2190402
Article Google Scholar
Gu, Y., Zhao, X., Gong, C., Yi, X.: Deepfake video detection using audio-visual consistency. In: Zhao, X., Shi, Y.-Q., Piva, A., Kim, H.J. (eds.) IWDW 2020. LNCS, vol. 12617, pp. 168–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69449-4_13
Chapter Google Scholar
Gu, Z., et al.: Spatiotemporal inconsistency learning for deepfake video detection. In: Shen, H.T., et al. (eds.) MM 2021: ACM Multimedia Conference, pp. 3473–3481. ACM, Virtual Event, China (2021). https://doi.org/10.1145/3474085.3475508
Gu, Z., Chen, Y., Yao, T., Ding, S., Li, J., Ma, L.: Delving into the local: dynamic inconsistency learning for deepfake video detection. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, pp. 744–752. AAAI Press, Virtual Event (2022)
Google Scholar
Guo, J., Han, K., Wu, H., Xu, C., Tang, Y., Xu, C., Wang, Y.: CMT: convolutional neural networks meet vision transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, Louisiana (2022)
Google Scholar
Haliassos, A., Vougioukas, K., Petridis, S., Pantic, M.: Lips don’t lie: a generalisable and robust approach to face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 5039–5049. Virtual (2021)
Google Scholar
Hu, J., Liao, X., Liang, J., Zhou, W., Qin, Z.: FInfer: frame inference-based deepfake detection for high-visual-quality videos. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, pp. 951–959. AAAI Press, Virtual Event (2022)
Google Scholar
Hu, Y., Zhao, H., Yu, Z., Liu, B., Yu, X.: Exposing deepfake videos with spatial, frequency and multi-scale temporal artifacts. In: Zhao, X., Piva, A., Comesaña-Alfaro, P. (eds.) IWDW 2021. LNCS, vol. 13180, pp. 47–57. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95398-0_4
Chapter Google Scholar
Hu, Z., Xie, H., Wang, Y., Li, J., Wang, Z., Zhang, Y.: Dynamic inconsistency-aware deepfake video detection. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, pp. 736–742. Ijcai.org, Virtual Event/Montreal, Canada (2021). https://doi.org/10.24963/ijcai.2021/102
Jiang, Y., Chang, S., Wang, Z.: TransGAN: two pure transformers can make one strong GAN, and that can scale up. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pp. 14745–14758. Virtual (2021)
Google Scholar
Lee, C.C.: Elimination of redundant operations for a fast Sobel operator. IEEE Trans. Syst. Man Cybern. 13(2), 242–245 (1983). https://doi.org/10.1109/TSMC.1983.6313122
Article Google Scholar
Li, J., Xie, H., Li, J., Wang, Z., Zhang, Y.: Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 6458–6467. Virtual (2021)
Google Scholar
Li, L., et al.: Face X-ray for more general face forgery detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, pp. 5000–5009 (2020). https://doi.org/10.1109/CVPR42600.2020.00505
Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-DF: a large-scale challenging dataset for deepfake forensics. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, pp. 3204–3213 (2020). https://doi.org/10.1109/CVPR42600.2020.00327
Liu, H., et al.: Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 772–781. Virtual (2021)
Google Scholar
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 14020–14029 (2021). https://doi.org/10.1109/ICCV48922.2021.01378
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Luo, Y., Zhang, Y., Yan, J., Liu, W.: Generalizing face forgery detection with high-frequency features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 16317–16326. Computer Vision Foundation/IEEE, Virtual (2021). https://doi.org/10.1109/CVPR46437.2021.01605
Pei, P., Zhao, X., Li, J., Cao, Y., Yi, X.: Vision transformer based video hashing retrieval for tracing the source of fake videos. CoRR abs/2112.08117 (2021). https://arxiv.org/abs/2112.08117
Qian, Y., Yin, G., Sheng, L., Chen, Z., Shao, J.: Thinking in frequency: face forgery detection by mining frequency-aware clues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 86–103. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_6
Chapter Google Scholar
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: FaceForensics++: learning to detect manipulated facial images. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009
Sorkine, O., Cohen-Or, D., Lipman, Y., Alexa, M., Rössl, C., Seidel, H.: Laplacian surface editing. In: Boissonnat, J., Alliez, P. (eds.) Second Eurographics Symposium on Geometry Processing, Nice, France, 8–10 July 2004. ACM International Conference Proceeding Series, Nice, France, vol. 71, pp. 175–184 (2004). https://doi.org/10.2312/SGP/SGP04/179-188
Sun, Z., Han, Y., Hua, Z., Ruan, N., Jia, W.: Improving the efficiency and robustness of deepfakes detection through precise geometric features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 3609–3618. Virtual (2021)
Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, vol. 97, pp. 6105–6114 (2019)
Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)
Google Scholar
Wang, C., Deng, W.: Representative forgery mining for fake face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 14923–14932. Virtual (2021)
Google Scholar
Wang, W., Xie, E., Li, X., Fan, D.P., Shao, L.: PVTV 2: improved baselines with pyramid vision transformer. CoRR abs/2106.13797 (2021)
Google Scholar
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 548–558 (2021). https://doi.org/10.1109/ICCV48922.2021.00061
Yang, C., Ma, J., Wang, S., Liew, A.W.: Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis. IEEE Trans. Inf. Forensics Secur. 16, 1841–1854 (2021). https://doi.org/10.1109/TIFS.2020.3045937
Article Google Scholar
Yang, J., Li, A., Xiao, S., Lu, W., Gao, X.: MTD-net: Learning to detect deepfakes images by multi-scale texture difference. IEEE Trans. Inf. Forensics Secur. 16, 4234–4245 (2021). https://doi.org/10.1109/TIFS.2021.3102487
Article Google Scholar
Yuan, Y., et al.: HRFormer: high-resolution vision transformer for dense predict. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pp. 7281–7293. Virtual (2021)
Google Scholar
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Article Google Scholar
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., Yu, N.: Multi-attentional deepfake detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 2185–2194. Virtual (2021)
Google Scholar
Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., Xia, W.: Learning self-consistency for deepfake detection. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 15003–15013 (2021). https://doi.org/10.1109/ICCV48922.2021.01475

Download references

Acknowledgments

This work was supported by National Key Technology Research and Development Program under 2020AAA0140000.

Author information

Authors and Affiliations

State Key Laboratory of Information Security, Institute of Information Engineering, Chinese Academy of Sciences, Beijing, 100195, China
Pengfei Pei, Xianfeng Zhao, Yun Cao & Chengqiao Hu
School of Cyber Security, University of Chinese Academy of Sciences, Beijing, 100195, China
Pengfei Pei, Xianfeng Zhao & Yun Cao

Authors

Pengfei Pei
View author publications
You can also search for this author in PubMed Google Scholar
Xianfeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Yun Cao
View author publications
You can also search for this author in PubMed Google Scholar
Chengqiao Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xianfeng Zhao .

Editor information

Editors and Affiliations

Chinese Academy of Sciences, Institute of Information Engineering, Beijing, China
Xianfeng Zhao
Guangxi Normal University, Guilin, China
Zhenjun Tang
Universidade de Vigo, Vigo, Spain
Pedro Comesaña-Alfaro
University of Florence, Florence, Italy
Alessandro Piva

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pei, P., Zhao, X., Cao, Y., Hu, C. (2023). Visual Explanations for Exposing Potential Inconsistency of Deepfakes. In: Zhao, X., Tang, Z., Comesaña-Alfaro, P., Piva, A. (eds) Digital Forensics and Watermarking. IWDW 2022. Lecture Notes in Computer Science, vol 13825. Springer, Cham. https://doi.org/10.1007/978-3-031-25115-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-25115-3_5
Published: 29 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25114-6
Online ISBN: 978-3-031-25115-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Visual Explanations for Exposing Potential Inconsistency of Deepfakes