Abstract
The deepfake technique replaces the face in a source video with a fake face which is generated using deep learning tools such as generative adversarial networks (GANs). Even the facial expression can be well synchronized, making it difficult to identify the fake videos. Using features from multiple domains has been proved effective in the literature. It is also known that the temporal information is particularly critical in detecting deepfake videos, since the face-swapping of a video is implemented frame by frame. In this paper, we argue that the temporal differences between authentic and fake videos are complex and can not be adequately depicted from a single time scale. To obtain a complete picture of the temporal deepfake traces, we design a detection model with a short-term feature extraction module and a long-term feature extraction module. The short-term module captures the gradient information of adjacent frames. which is incorporated with the frequency and spatial information to make a multi-domain feature set. The long-term module then reveals the artifacts from a longer period of context. The proposed algorithm is tested on several popular databases, namely FaceForensics++, DeepfakeDetection (DFD), TIMIT-DF and FFW. Experimental results have validated the effectiveness of our algorithm through improved detection performance compared with related works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Luo, Y., Zhang, Y., Yan, J., et al.: Generalizing face forgery detection with high-frequency features. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16317–16326 (2021)
Odena, A., Dumoulin, V., Olah, C.: Deconvolution and checkerboard artifacts. Distill 1(10), e3 (2016)
Popescu, A.C., Farid, H.: Exposing digital forgeries in color filter array interpolated images. IEEE Trans. Sig. Process. 53(10), 3948–3959 (2005)
Frank, J., Eisenhofer, T., Schönherr, L., et al.: Leveraging frequency analysis for deep fake image recognition. In: International Conference on Machine Learning, pp. 3247–3258. PMLR (2020)
Han, B., Han, X., Zhang, H., et al.: Fighting fake news: two stream network for deepfake detection via learnable SRM. IEEE Trans. Biom. Behav. Identity Sci. 3(3), 320–331 (2021)
Wang, Z., et al.: Deep spatial gradient and temporal depth learning for face anti-spoofing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020)
Yang, J., Li, A., Xiao, S., et al.: MTD-Net: learning to detect deepfakes images by multi-scale texture difference. IEEE Trans. Inf. Forensics Secur. 16, 4234–4245 (2021)
Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(11), 2579–2605 (2008)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Korshunov, P., Marcel, S.: Deepfakes: a new threat to face recognition? Assessment and detection. arXiv preprint arXiv:1812.08685 (2018)
Rossler, A., Cozzolino, D., Verdoliva, L., et al.: FaceForensics++: learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1–11 (2019)
Khodabakhsh, A., Ramachandra, R., Raja, K., et al.: Fake face detection methods: can they be generalized? In: 2018 International Conference of the Biometrics Special Interest Group (BIOSIG), pp. 1–6. IEEE (2018)
Afchar, D., Nozick, V., Yamagishi, J., et al.: MesoNet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7. IEEE (2018)
Bayar, B., Stamm, M.C.: Constrained convolutional neural networks: a new approach towards general purpose image manipulation detection. IEEE Trans. Inf. Forensics Secur. 13(11), 2691–2706 (2018)
Tariq, S., Lee, S., Kim, H., et al.: Detecting both machine and human created fake face images in the wild. In: Proceedings of the 2nd International Workshop on Multimedia Privacy and Security, pp. 81–87 (2018)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258 (2017)
Li, X., Lang, Y., Chen, Y., et al.: Sharp multiple instance learning for deepfake video detection. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 1864–1872 (2020)
Dang, H., Liu, F., Stehouwer, J., et al.: On the detection of digital face manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5781–5790 (2020)
Funding
This work is supported by the National Key Research and Development Project under Grant 2019QY2202, China-Singapore International Joint Research Institute under Grant 206-A018001 and Science and Technology Foundation of Guangzhou Huangpu Development District under Grant 2019GH16.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hu, Y., Zhao, H., Yu, Z., Liu, B., Yu, X. (2022). Exposing Deepfake Videos with Spatial, Frequency and Multi-scale Temporal Artifacts. In: Zhao, X., Piva, A., Comesaña-Alfaro, P. (eds) Digital Forensics and Watermarking. IWDW 2021. Lecture Notes in Computer Science(), vol 13180. Springer, Cham. https://doi.org/10.1007/978-3-030-95398-0_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-95398-0_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95397-3
Online ISBN: 978-3-030-95398-0
eBook Packages: Computer ScienceComputer Science (R0)