Abstract
The generalization of face swapping detectors is necessary when applied to the practical applications. Although the most existing methods may achieve accuracy detection performance on the known forgeries, they fail to make the prediction when faced with unseen face manipulation methods. To alleviate the problem, we propose a novel and practical framework called Detection based on Identity Spatial Constraints with Weighted Frequency Division (DISC-WFD) through introducing the reference image, consisting of the backbone network, the shared Identity Semantic Encoder (ISE) and the corresponding Identity Spatial Constraint (ISC) branches. The ISE is utilized to measure the identity similarity between the input image and the reference image and generates identity spatial constraints. The constraints are imposed on ISC to focus on the high frequency and low-frequency identity-related areas for the discriminative information. The proposed method can significantly improve the performance and the generalization against the unseen manipulation methods. Furthermore, the cross-dataset experiments validate the superiority and the effectiveness of the DISC-WFD.





Similar content being viewed by others
References
Anonymous: Deepfakes/faceswap: Deepfakes Software For All. https://github.com/deepfakes/faceswap. Accessed 2021-09-03
Perov, I., Gao, D., Chervoniy, N., Liu, K., Marangonda, S., Umé, C., Dpfks, M., Facenheim, C.S., RP, L., Jiang, J., Zhang, S., Wu, P., Zhou, B., Zhang, W.: DeepFaceLab: integrated, flexible and extensible face-swapping framework (2021). https://github.com/iperov/DeepFaceLab
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2Face: Real-time face capture and reenactment of rgb videos. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2387–2395 (2016). https://doi.org/10.1109/CVPR.2016.262
Zakharov, E., Shysheya, A., Burkov, E., Lempitsky, V.: Few-shot adversarial learning of realistic neural talking head models. In: ICCV (2019). https://doi.org/10.1109/ICCV.2019.00955
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8107–8116 (2020). https://doi.org/10.1109/CVPR42600.2020.00813
Nirkin, Y., Keller, Y., Hassner, T.: FSGAN: subject agnostic face swapping and reenactment. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7183–7192 (2019). https://doi.org/10.1109/ICCV.2019.00728
Li, L., Bao, J., Yang, H., Chen, D., Wen, F.: Advancing high fidelity identity swapping for forgery detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5073–5082 (2020). https://doi.org/10.1109/CVPR42600.2020.00512
Agarwal, S., Farid, H., Gu, Y., He, M., Nagano, K., Li, H.: Protecting world leaders against deep fakes. In: CVPRW (2019)
Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: MesoNet: a compact facial video forgery detection network. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7 (2018). https://doi.org/10.1109/WIFS.2018.8630761
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niessner, M.: FaceForensics++: Learning to detect manipulated facial images. In: ICCV (2019). https://doi.org/10.1109/ICCV.2019.00009
Li, Y., Lyu, S.: Exposing deepfake videos by detecting face warping artifacts. In: CVPRW (2019)
Deepfakes. https://github.com/deepfakes/faceswap. Accessed: 2021-01-20
Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-DF: A large-scale challenging dataset for deepfake forensics. In: CVPR (2020). https://doi.org/10.1109/CVPR42600.2020.00327
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR (2019). https://doi.org/10.1109/CVPR.2019.00584
Dufour, N., Gully, A.: Contributing Data to Deepfake Detection Research. https://ai.googleblog.com/2019/09/contributing-data-to-deepfake-detection.html. Accessed 2021-09-16
Jiang, L., Li, R., Wu, W., Qian, C., Loy, C.C.: DeeperForensics-1.0: a large-scale dataset for real-world face forgery detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2886–2895 (2020). https://doi.org/10.1109/CVPR42600.2020.00296
Korshunova, I., Shi, W., Dambre, J., Theis, L.: Fast face-swap using convolutional neural networks. In: ICCV (2017). https://doi.org/10.1109/ICCV.2017.397
faceswap-GAN. https://github.com/shaoanlu/faceswap-GAN. Accessed: 2020-11-20
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006). https://doi.org/10.1126/science.1127647. https://science.sciencemag.org/content/313/5786/504.full.pdf.
Dong, J., Wang, Y., Lai, J., Xie, X.: Restricted black-box adversarial attack against deepfake face swapping (2022). arXiv preprint arXiv:2204.12347
Xu, C., Zhang, J., Hua, M., He, Q., Yi, Z., Liu, Y.: Region-aware face swapping. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7632–7641 (2022)
Xu, Z., Hong, Z., Ding, C., Zhu, Z., Han, J., Liu, J., Ding, E.: Mobilefaceswap: a lightweight framework for video face swapping (2022). arXiv preprint arXiv:2201.03808
Matern, F., Riess, C., Stamminger, M.: Exploiting visual artifacts to expose deepfakes and face manipulations. In: WACVW (2019). https://doi.org/10.1109/WACVW.2019.00020
Yang, X., Li, Y., Lyu, S.: Exposing deep fakes using inconsistent head poses. In: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8261–8265 (2019). https://doi.org/10.1109/ICASSP.2019.8683164
Li, Y., Chang, M.-C., Lyu, S.: In ictu oculi: exposing AI created fake videos by detecting eye blinking. In: 2018 IEEE International Workshop on Information Forensics and Security (WIFS), pp. 1–7 (2018). https://doi.org/10.1109/WIFS.2018.8630787
Ciftci, U.A., Demir, I., Yin, L.: FakeCatcher: detection of synthetic portrait videos using biological signals. IEEE Trans. Pattern Anal. Mach. Intell., 1–1 (2020). https://doi.org/10.1109/TPAMI.2020.3009287
Ciftci, U.A., Demir, I., Yin, L.: How do the hearts of deep fakes beat? deep fake source detection via interpreting residuals with biological signals. In: IJCB (2020). https://doi.org/10.1109/IJCB48548.2020.9304909
Korshunov, P., Marcel, S.: Vulnerability assessment and detection of deepfake videos. In: 2019 International Conference on Biometrics (ICB), pp. 1–6 (2019). https://doi.org/10.1109/ICB45273.2019.8987375
Yu, N., Davis, L., Fritz, M.: Attributing fake images to GANs: learning and analyzing GAN fingerprints. In: ICCV (2019). https://doi.org/10.1109/ICCV.2019.00765
Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do GANs leave artificial fingerprints? In: MIPR (2019). https://doi.org/10.1109/MIPR.2019.00103
Durall, R., Keuper, M., Keuper, J.: Watch your up-convolution: CNN based generative deep neural networks are failing to reproduce spectral distributions. In: CVPR (2020). https://doi.org/10.1109/CVPR42600.2020.00791
Zhou, P., Han, X., Morariu, V.I., Davis, L.S.: Two-stream neural networks for tampered face detection. In: CVPRW (2017). https://doi.org/10.1109/CVPRW.2017.229
Nguyen, H.H., Fang, F., Yamagishi, J., Echizen, I.: Multi-task learning for detecting and segmenting manipulated facial images and videos. In: BTAS (2019). https://doi.org/10.1109/BTAS46853.2019.9185974
Dang, H., Liu, F., Stehouwer, J., Liu, X., Jain, A.K.: On the detection of digital face manipulation. In: CVPR (2020). https://doi.org/10.1109/CVPR42600.2020.00582
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., Guo, B.: Face X-Ray for more general face forgery detection. In: CVPR (2020). https://doi.org/10.1109/CVPR42600.2020.00505
Wang, X., Yao, T., Ding, S., Ma, L.: Face manipulation detection via auxiliary supervision. In: Yang, H., Pasupa, K., Leung, A.C.-S., Kwok, J.T., Chan, J.H., King, I. (eds.) Neural Information Processing, pp. 313–324. Springer, Cham (2020)
Sabir, E., Cheng, J., Jaiswal, A., AbdAlmageed, W., Masi, I., Natarajan, P.: Recurrent convolutional strategies for face manipulation detection in videos. In: CVPRW (2019)
Chen, P., Liu, J., Liang, T., Zhou, G., Gao, H., Dai, J., Han, J.: FSSPOTTER: spotting face-swapped video by spatial and temporal clues. In: 2020 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2020). https://doi.org/10.1109/ICME46284.2020.9102914
Chintha, A., Rao, A., Sohrawardi, S., Bhatt, K., Wright, M., Ptucha, R.: Leveraging edges and optical flow on faces for deepfake detection. In: IJCB (2020). https://doi.org/10.1109/IJCB48548.2020.9304936
Deng, J., Guo, J., Xue, N., Zafeiriou, S.: Arcface: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694 (2019). https://doi.org/10.1109/CVPR.2019.00482
Li, Z., Liu, Y., Li, B., Hu, W., Zhou, X.: Adaptive coarse-to-fine interactor for multi-scale object detection. In: 2021 International Joint Conference on Neural Networks (IJCNN), pp. 1–8 (2021). IEEE
Li, Z., Liu, Y., Li, B., Hu, W., Zhang, H.: Dsic: dynamic sample-individualized connector for multi-scale object detection. In: 2021 IEEE International Conference on Multimedia and Expo (ICME) (2021)
Li, Z., Liu, Y., Li, B., Feng, B., Wu, K., Peng, C., Hu, W.: Sdtp: Semantic-aware decoupled transformer pyramid for dense image prediction. IEEE Trans. Circ. Syst. Video Technol. (2022)
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niessner, M.: FaceForensics++: learning to detect manipulated facial images. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1–11 (2019). https://doi.org/10.1109/ICCV.2019.00009
Kowalski, M.: MarekKowalski/FaceSwap: 3D face swapping implemented in python. https://github.com/MarekKowalski/FaceSwap. Accessed 2021-09-23
Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-DF: a large-scale challenging dataset for deepfake forensics. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3204–3213 (2020). https://doi.org/10.1109/CVPR42600.2020.00327
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5686–5696 (2019). https://doi.org/10.1109/CVPR.2019.00584
Chen, S., Liu, Y., Gao, X., Han, Z.: Mobilefacenets: efficient cnns for accurate real-time face verification on mobile devices. In: Zhou, J., Wang, Y., Sun, Z., Jia, Z., Feng, J., Shan, S., Ubul, K., Guo, Z. (eds.) Biomet. Recogn., pp. 428–438. Springer, Cham (2018)
Deng, J., Guo, J., Ververas, E., Kotsia, I., Zafeiriou, S.: RetinaFace: single-shot multi-level face localisation in the wild. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5202–5211 (2020). https://doi.org/10.1109/CVPR42600.2020.00525
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. TPAMI 13(4), 376–380 (1991). https://doi.org/10.1109/34.88573
Umeyama, S.: Least-squares estimation of transformation parameters between two point patterns. IEEE Trans. Pattern Anal. Mach. Intell. 13(4), 376–380 (1991). https://doi.org/10.1109/34.88573
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2015)
Feng, D., Lu, X., Lin, X.: Deep detection for face manipulation. In: ICONIP (2020)
Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1800–1807 (2017). https://doi.org/10.1109/CVPR.2017.195
Acknowledgements
This work was supported by the National Key Research and Development Program of China (Grant No. 2020AAA0106800), the National Natural Science Foundation of China (No. 62192785, Grant No.61902401, No. 61972071, No. U1936204, No. 62122086, No. 62036011, No. 62192782 and No. 61721004), the Beijing Natural Science Foundation No. M22005, the CAS Key Research Program of Frontier Sciences (Grant No. QYZDJ-SSW-JSC040). The work of Bing Li was also supported by the Youth Innovation Promotion Association, CAS.
Author information
Authors and Affiliations
Contributions
All authors have the abundant discussion and have reviewed the manuscript carefully.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ai, Z., Peng, C., Jiang, J. et al. Face swapping detection based on identity spatial constraints with weighted frequency division. Multimedia Systems 29, 627–640 (2023). https://doi.org/10.1007/s00530-022-01007-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00530-022-01007-4