Abstract
In recent years, the rapid development of Deepfake has aroused public concerns. Existing Deepfake detection methods mainly focus on improving the accuracy. However, when real-world victims require additional interpretable results to refute, the accuracy of these methods is certainly insufficient. To mitigate this issue, we delve into forgery traces and propose a novel framework, named Find-X, that presents additional visual information as an explanation of the results. Specifically, we design a new module named Separation Potential Inconsistency (SPI) which aims to visually explain the forgery traces of fake videos. Find-x detection of Deepfake consists of three stages: (1) A frequency-aware module and a spatial-aware module to enhance the features. (2) A multi-scale feature extraction module to extract richer features. (3) A classification module and a SPI module to output the visual explanations. Our method outperforms state-of-the-art competitors on three popular benchmark datasets: FaceForensics++, Celeb-DF, and DeepFakeDetection. In addition, extensive visualization experiments on FaceForensics++ demonstrate that SPI can effectively separate the potentially inconsistent features of videos generated by five different Deepfake methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: ViViT: a video vision transformer. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 6816–6826 (2021). https://doi.org/10.1109/ICCV48922.2021.00676
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Cozzolino, D., Rössler, A., Thies, J., Nießner, M., Verdoliva, L.: ID-reveal: identity-aware deepfake video detection. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 15088–15097 (2021). https://doi.org/10.1109/ICCV48922.2021.01483
Diao, Q., Jiang, Y., Wen, B., Sun, J., Yuan, Z.: MetaFormer: a unified meta framework for fine-grained recognition. CoRR abs/2203.02751 (2022). https://doi.org/10.48550/arXiv.2203.02751
Dufour, N., Gully, A.: DeepFakeDetection dataset (2019). https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html
Fridrich, J.J., Kodovský, J.: Rich models for steganalysis of digital images. IEEE Trans. Inf. Forensics Secur. 7(3), 868–882 (2012). https://doi.org/10.1109/TIFS.2012.2190402
Gu, Y., Zhao, X., Gong, C., Yi, X.: Deepfake video detection using audio-visual consistency. In: Zhao, X., Shi, Y.-Q., Piva, A., Kim, H.J. (eds.) IWDW 2020. LNCS, vol. 12617, pp. 168–180. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-69449-4_13
Gu, Z., et al.: Spatiotemporal inconsistency learning for deepfake video detection. In: Shen, H.T., et al. (eds.) MM 2021: ACM Multimedia Conference, pp. 3473–3481. ACM, Virtual Event, China (2021). https://doi.org/10.1145/3474085.3475508
Gu, Z., Chen, Y., Yao, T., Ding, S., Li, J., Ma, L.: Delving into the local: dynamic inconsistency learning for deepfake video detection. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, pp. 744–752. AAAI Press, Virtual Event (2022)
Guo, J., Han, K., Wu, H., Xu, C., Tang, Y., Xu, C., Wang, Y.: CMT: convolutional neural networks meet vision transformers. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, Louisiana (2022)
Haliassos, A., Vougioukas, K., Petridis, S., Pantic, M.: Lips don’t lie: a generalisable and robust approach to face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 5039–5049. Virtual (2021)
Hu, J., Liao, X., Liang, J., Zhou, W., Qin, Z.: FInfer: frame inference-based deepfake detection for high-visual-quality videos. In: Thirty-Sixth AAAI Conference on Artificial Intelligence, pp. 951–959. AAAI Press, Virtual Event (2022)
Hu, Y., Zhao, H., Yu, Z., Liu, B., Yu, X.: Exposing deepfake videos with spatial, frequency and multi-scale temporal artifacts. In: Zhao, X., Piva, A., Comesaña-Alfaro, P. (eds.) IWDW 2021. LNCS, vol. 13180, pp. 47–57. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95398-0_4
Hu, Z., Xie, H., Wang, Y., Li, J., Wang, Z., Zhang, Y.: Dynamic inconsistency-aware deepfake video detection. In: Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, pp. 736–742. Ijcai.org, Virtual Event/Montreal, Canada (2021). https://doi.org/10.24963/ijcai.2021/102
Jiang, Y., Chang, S., Wang, Z.: TransGAN: two pure transformers can make one strong GAN, and that can scale up. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pp. 14745–14758. Virtual (2021)
Lee, C.C.: Elimination of redundant operations for a fast Sobel operator. IEEE Trans. Syst. Man Cybern. 13(2), 242–245 (1983). https://doi.org/10.1109/TSMC.1983.6313122
Li, J., Xie, H., Li, J., Wang, Z., Zhang, Y.: Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 6458–6467. Virtual (2021)
Li, L., et al.: Face X-ray for more general face forgery detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, pp. 5000–5009 (2020). https://doi.org/10.1109/CVPR42600.2020.00505
Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-DF: a large-scale challenging dataset for deepfake forensics. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, pp. 3204–3213 (2020). https://doi.org/10.1109/CVPR42600.2020.00327
Liu, H., et al.: Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 772–781. Virtual (2021)
Liu, R., et al.: FuseFormer: fusing fine-grained information in transformers for video inpainting. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 14020–14029 (2021). https://doi.org/10.1109/ICCV48922.2021.01378
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (2021)
Luo, Y., Zhang, Y., Yan, J., Liu, W.: Generalizing face forgery detection with high-frequency features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 16317–16326. Computer Vision Foundation/IEEE, Virtual (2021). https://doi.org/10.1109/CVPR46437.2021.01605
Pei, P., Zhao, X., Li, J., Cao, Y., Yi, X.: Vision transformer based video hashing retrieval for tracing the source of fake videos. CoRR abs/2112.08117 (2021). https://arxiv.org/abs/2112.08117
Qian, Y., Yin, G., Sheng, L., Chen, Z., Shao, J.: Thinking in frequency: face forgery detection by mining frequency-aware clues. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12357, pp. 86–103. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58610-2_6
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: FaceForensics++: learning to detect manipulated facial images. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009
Sorkine, O., Cohen-Or, D., Lipman, Y., Alexa, M., Rössl, C., Seidel, H.: Laplacian surface editing. In: Boissonnat, J., Alliez, P. (eds.) Second Eurographics Symposium on Geometry Processing, Nice, France, 8–10 July 2004. ACM International Conference Proceeding Series, Nice, France, vol. 71, pp. 175–184 (2004). https://doi.org/10.2312/SGP/SGP04/179-188
Sun, Z., Han, Y., Hua, Z., Ruan, N., Jia, W.: Improving the efficiency and robustness of deepfakes detection through precise geometric features. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 3609–3618. Virtual (2021)
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks. In: Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, California, vol. 97, pp. 6105–6114 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, pp. 5998–6008 (2017)
Wang, C., Deng, W.: Representative forgery mining for fake face detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 14923–14932. Virtual (2021)
Wang, W., Xie, E., Li, X., Fan, D.P., Shao, L.: PVTV 2: improved baselines with pyramid vision transformer. CoRR abs/2106.13797 (2021)
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 548–558 (2021). https://doi.org/10.1109/ICCV48922.2021.00061
Yang, C., Ma, J., Wang, S., Liew, A.W.: Preventing deepfake attacks on speaker authentication by dynamic lip movement analysis. IEEE Trans. Inf. Forensics Secur. 16, 1841–1854 (2021). https://doi.org/10.1109/TIFS.2020.3045937
Yang, J., Li, A., Xiao, S., Lu, W., Gao, X.: MTD-net: Learning to detect deepfakes images by multi-scale texture difference. IEEE Trans. Inf. Forensics Secur. 16, 4234–4245 (2021). https://doi.org/10.1109/TIFS.2021.3102487
Yuan, Y., et al.: HRFormer: high-resolution vision transformer for dense predict. In: Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, pp. 7281–7293. Virtual (2021)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., Yu, N.: Multi-attentional deepfake detection. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, pp. 2185–2194. Virtual (2021)
Zhao, T., Xu, X., Xu, M., Ding, H., Xiong, Y., Xia, W.: Learning self-consistency for deepfake detection. In: 2021 IEEE/CVF International Conference on Computer Vision, ICCV 2021, Montreal, QC, Canada, pp. 15003–15013 (2021). https://doi.org/10.1109/ICCV48922.2021.01475
Acknowledgments
This work was supported by National Key Technology Research and Development Program under 2020AAA0140000.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pei, P., Zhao, X., Cao, Y., Hu, C. (2023). Visual Explanations for Exposing Potential Inconsistency of Deepfakes. In: Zhao, X., Tang, Z., Comesaña-Alfaro, P., Piva, A. (eds) Digital Forensics and Watermarking. IWDW 2022. Lecture Notes in Computer Science, vol 13825. Springer, Cham. https://doi.org/10.1007/978-3-031-25115-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-031-25115-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25114-6
Online ISBN: 978-3-031-25115-3
eBook Packages: Computer ScienceComputer Science (R0)