Abstract
The significance of face forgery detection has grown substantially due to the emergence of facial manipulation technologies. Recent methods have turned to face detection forgery in the spatial-frequency domain, resulting in improved overall performance. Nonetheless, these methods are still not guaranteed to cover various forgery technologies, and the networks trained on public datasets struggle to accurately quantify their uncertainty levels. In this work, we design a Dynamic Dual-spectrum Interaction Network that allows test-time training with uncertainty guidance and spatial-frequency prompt learning. RGB and frequency features are first interacted in multi-level by using a Frequency-guided Attention Module. Then these multi-modal features are merged with a Dynamic Fusion Module. As a bias in the fusion weight of uncertain data during dynamic fusion, we further exploit uncertain perturbation as guidance during the test-time training phase. Furthermore, we propose a spatial-frequency prompt learning method to effectively enhance the generalization of the forgery detection model. Finally, we curate a novel, extensive dataset containing images synthesized by various diffusion and non-diffusion methods. Comprehensive evaluations of experiments show that our method achieves more appealing results for face forgery detection than recent state-of-the-art methods.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Afchar, D., Nozick, V., Yamagishi, J., & Echizen, I. (2018). Mesonet: A compact facial video forgery detection network. In IEEE International Workshop on Information Forensics and Security, pp 1–7.
Bai, W., Liu, Y., Zhang, Z., Li, B., & Hu, W. (2023). Aunet: Learning relations between action units for face forgery detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 24709–24719.
Bartler, A., Bühler, A., Wiewel, F., Döbler, M., & Yang, B. (2022). MT3: Meta test-time training for self-supervised test-time adaption. International Conference on Artificial Intelligence and Statistics, 151, 3080–3090.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901.
Cao, J., Ma, C., Yao, T., Chen, S., Ding, S., & Yang, X. (2022). End-to-end reconstruction-classification learning for face forgery detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4103–4112.
Chen, A., Yao, Y., Chen, P., Zhang, Y., & Liu, S. (2022a). Understanding and improving visual prompting: A label-mapping perspective. arxiv:abs/2211.11635
Chen, L., Zhang, Y., Song, Y., Wang, J., & Liu, L. (2022b). OST: Improving generalization of deepfake detection via one-shot test-time training. In: Advances in Neural Information Processing Systems.
Chen, S., Yao, T., Chen, Y., Ding, S., Li, J., & Ji, R. (2021). Local relation learning for face forgery detection. In Thirty-Fifth Conference on Artificial Intelligence, AAAI, pp 1081–1088.
Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., & Verdoliva, L. (2018). Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arxiv: abs/1812.02510
Dang, H., Liu, F., Stehouwer, J., Liu, X., & Jain, AK. (2020). On the detection of digital face manipulation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5780–5789.
Das, S., Seferbekov, S.S., Datta, A., Islam, MS., & Amin, MR. (2021). Towards solving the deepfake problem : An analysis on improving deepfake detection using dynamic face augmentation. In IEEE/CVF International Conference on Computer Vision Workshops, pp 3769–3778.
Dolhansky, B., Howes, R., Pflaum, B., Baram, N., & Canton-Ferrer, C. (2019). The deepfake detection challenge (DFDC) preview dataset. arxiv: abs/1910.08854
Dong, B., Zhou, P., Yan, S., & Zuo, W. (2023a). LPT: Long-tailed prompt tuning for image classification. In The Eleventh International Conference on Learning Representations.
Dong, C., Chen, X., Hu, R., Cao, J., & Li, X. (2023). Mvss-net: Multi-view multi-scale supervised networks for image manipulation detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(3), 3539–3553.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations.
Gao, G., Huang, H., Fu, C., Li, Z., & He, R. (2021). Information bottleneck disentanglement for identity swapping. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3404–3413.
Gu, Q., Chen, S., Yao, T., Chen, Y., Ding, S., & Yi, R. (2022). Exploiting fine-grained face forgery clues via progressive enhancement learning. In Thirty-Sixth Conference on Artificial Intelligence, AAAI, pp 735–743.
Guillaro, F., Cozzolino, D., Sud, A., Dufour, N., & Verdoliva, L. (2023). Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 20606–20615.
Gumbel, E.J. (1954). Statistical theory of extreme values and some practical applications: A series of lectures. 33.
Guo, H., Wang, H., & Ji, Q. (2022). Uncertainty-guided probabilistic transformer for complex action recognition. In IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp 20020–20029.
Guo, X., Liu, X., Ren, Z., Grosz, S., Masi, I., & Liu, X. (2023). Hierarchical fine-grained image forgery detection and localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3155–3165.
Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., de Laroussilhe, Q., Gesmundo, A., Attariyan, M., & Gelly, S. (2019). Parameter-efficient transfer learning for NLP. InProceedings of the 36th International Conference on Machine Learning, 97, 2790–2799.
Huang, B., Wang, Z., Yang, J., Ai, J., Zou, Q., Wang, Q., & Ye, D. (2023a). Implicit identity driven deepfake face swapping detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4490–4499.
Huang, Q., Dong, X., Chen, D., Zhang, W., Wang, F., Hua, G., & Yu, N. (2023b). Diversity-aware meta visual prompting. arxiv: abs/2303.08138
Huang, S., Huang, H., Wang, Z., Xu, N., Zheng, A., & He, R. (2023c). Uncertainty-guided test-time training for face forgery detection. In: Pattern Recognition—7th Asian Conference, ACPR, vol 14407, pp 258–272.
Jang, E., Gu, S., & Poole, B. (2017). Categorical reparameterization with gumbel-softmax. In International Conference on Learning Representations.
Jia, M., Tang, L., Chen, B., Cardie, C., Belongie, S.J., Hariharan, B., & Lim, S. (2022). Visual prompt tuning. arxiv: abs/2203.12119
Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2018). Progressive growing of gans for improved quality, stability, and variation. In 6th International Conference on Learning Representations.
Karras, T., Laine, S., & Aila, T. (2021). A style-based generator architecture for generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(12), 4217–4228.
Kowalski, M. (2021). Faceswap. https://github.com/marekkowalski/faceswap, 2022, April 7
Kwon, M., Nam, S., Yu, I., Lee, H., & Kim, C. (2022). Learning JPEG compression artifacts for image manipulation detection and localization. International Journal of Computer Vision, 130(8), 1875–1895.
Li, D., Yang, Y., Song, Y., & Hospedales, T.M. (2018). Learning to generalize: Meta-learning for domain generalization. In McIlraith SA, Weinberger KQ (eds) Thirty-Second AAAI Conference on Artificial Intelligence, pp 3490–3497.
Li, D., Zhu, J., Wang, M., Liu, J., Fu, X., & Zha, Z. (2023). Edge-aware regional message passing controller for image forgery localization. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8222–8232.
Li, J., Xie, H., Li, J., Wang, Z., & Zhang, Y. (2021a). Frequency-aware discriminative feature learning supervised by single-center loss for face forgery detection. In IEEE Conference on Computer Vision and Pattern Recognition, pp 6458–6467.
Li, L., Bao, J., Zhang, T., Yang, H., Chen, D., Wen, F., & Guo, B. (2020a). Face x-ray for more general face forgery detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5000–5009.
Li, Y., Yang, X., Sun, P., Qi, H., & Lyu, S. (2020b). Celeb-df: A large-scale challenging dataset for deepfake forensics. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3204–3213.
Li, Y., Hao, M., Di, Z., Gundavarapu, N.B., & Wang, X. (2021b). Test-time personalization with a transformer for human pose estimation. In Advances in Neural Information Processing Systems, pp 2583–2597.
Liu, D., Dang, Z., Peng, C., Zheng, Y., Li, S., Wang, N., & Gao, X. (2023). Fedforgery: Generalized face forgery detection with residual federated learning. IEEE Transactions on Information Forensics and Security, 18, 4272–4284.
Liu, H., Li, X., Zhou, W., Chen, Y., He, Y., Xue, H., Zhang, W., & Yu, N. (2021a). Spatial-phase shallow learning: Rethinking face forgery detection in frequency domain. In IEEE Conference on Computer Vision and Pattern Recognition, pp 772–781.
Liu, H., Wu, Z., Li, L., Salehkalaibar, S., Chen, J., & Wang, K. (2022a). Towards multi-domain single image dehazing via test-time training. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 5821–5830.
Liu, L., Ren, Y., Lin, Z., & Zhao, Z. (2022b). Pseudo numerical methods for diffusion models on manifolds. In The Tenth International Conference on Learning Representations.
Liu, Y., Kothari, P., van Delft, B., Bellot-Gurlet, B., Mordan, T., & Alahi, A. (2021b). TTT++: When does self-supervised test-time training fail or thrive? In Advances in Neural Information Processing Systems, pp 21808–21820.
Luo, Y., Zhang, Y., Yan, J., & Liu, W. (2021). Generalizing face forgery detection with high-frequency features. In IEEE Conference on Computer Vision and Pattern Recognition, pp 16317–16326.
Masi, I., Killekar, A., Mascarenhas, R. M., Gurudatt, S. P., & AbdAlmageed, W. (2020). Two-branch recurrent network for isolating deepfakes in videos. European Conference on Computer Vision, 12352, 667–684.
Miao, C., Tan, Z., Chu, Q., Yu, N., & Guo, G. (2022). Hierarchical frequency-assisted interactive networks for face manipulation detection. IEEE Transactions on Information Forensics and Security, 17, 3008–3021.
Miao, C., Tan, Z., Chu, Q., Liu, H., Hu, H., & Yu, N. (2023). F\({}^{\text{2 }}\)trans: High-frequency fine-grained transformer for face forgery detection. IEEE Transactions on Information Forensics and Security, 18, 1039–1051.
Nguyen, H.H., Fang, F., Yamagishi, J., & Echizen, I. (2019a). Multi-task learning for detecting and segmenting manipulated facial images and videos. In IEEE International Conference on Biometrics Theory, Applications and Systems, pp 1–8.
Nguyen, H. H., Fang, F., Yamagishi, J., & Echizen, I. (2019). Multi-task learning for detecting and segmenting manipulated facial images and videos. 10th IEEE International Conference on Biometrics Theory (pp. 1–8). BTAS: Applications and Systems.
Patel, K., Bur, A.M., Li, F., & Wang, G. (2022). Aggregating global features into local vision transformer. In International Conference on Pattern Recognition. pp 1141–1147.
Qian, Y., Yin, G., Sheng, L., Chen, Z., & Shao, J. (2020). Thinking in frequency: Face forgery detection by mining frequency-aware clues. European Conference on Computer Vision, 12357, 86–103.
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., & Ommer, B. (2022). High-resolution image synthesis with latent diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10674–10685.
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019a). Faceforensics++: Learning to detect manipulated facial images. In 2019 IEEE/CVF International Conference on Computer Vision, pp 1–11.
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., & Nießner, M. (2019b). Faceforensics++: Learning to detect manipulated facial images. In IEEE/CVF International Conference on Computer Vision, pp 1–11.
Sagonas, C., Antonakos, E., Tzimiropoulos, G., Zafeiriou, S., & Pantic, M. (2016). 300 faces in-the-wild challenge: Database and results. Image and Vision Computing, 47, 3–18.
Sanh, V., Webson, A., Raffel, C., Bach, S.H., Sutawika, L., Alyafeai, Z., Chaffin, A., Stiegler, A., Raja, A., & Dey, M., et al. (2022). Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations.
Selvaraju, R. R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., & Batra, D. (2020). Grad-cam: Visual explanations from deep networks via gradient-based localization. International Journal of Computer Vision, 128(2), 336–359.
Shiohara, K., & Yamasaki, T. (2022). Detecting deepfakes with self-blended images. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 18699–18708.
Shu, M., Nie, W., Huang, D., Yu, Z., Goldstein, T., Anandkumar, A., & Xiao, C. (2022). Test-time prompt tuning for zero-shot generalization in vision-language models. In Advances in Neural Information Processing Systems.
Somepalli, G., Singla, V., Goldblum, M., Geiping, J., & Goldstein, T. (2023). Diffusion art or digital forgery? investigating data replication in diffusion models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6048–6058.
Sun, K., Liu, H., Ye, Q., Gao, Y., Liu, J., Shao, L., & Ji, R. (2021). Domain general face forgery detection by learning to weight. In Thirty-Fifth AAAI Conference on Artificial Intelligence, pp 2638–2646.
Sun, K., Yao, T., Chen, S., Ding, S., Li, J., & Ji, R. (2022). Dual contrastive learning for general face forgery detection. In Thirty-Sixth AAAI Conference on Artificial Intelligence, pp 2316–2324.
Sun, Y., Wang, X., Liu, Z., Miller, J., Efros, A. A., & Hardt, M. (2020). Test-time training with self-supervision for generalization under distribution shifts. In Proceedings of the 37th International Conference on Machine Learning, 119, 9229–9248.
Sushko, V., Schönfeld, E., Zhang, D., Gall, J., Schiele, B., & Khoreva, A. (2022). OASIS: Only adversarial supervision for semantic image synthesis. International Journal of Computer Vision, 130(12), 2903–2923.
Tan, M., & Le, Q.V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. In: Chaudhuri K, Salakhutdinov R (eds) Proceedings of the 36th International Conference on Machine Learning, ICML, vol 97, pp 6105–6114.
Thies, J., Zollhöfer, M., & Nießner, M. (2019). Deferred neural rendering: Image synthesis using neural textures. ACM Transactions on Graphics, 66(1–66), 12.
Thies, J., Zollhöfer, M., Stamminger, M., Theobalt, C., & Nießner, M. (2019). Face2face: Real-time face capture and reenactment of RGB videos. Communications of the ACM, 62(1), 96–104.
Tora. (2021). Deepfakes. https://github.com/deepfakes/faceswap, 2022, March 5.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems, pp 5998–6008.
Wang, C., & Deng, W. (2021). Representative forgery mining for fake face detection. In IEEE Conference on Computer Vision and Pattern Recognition, pp 14923–14932.
Wang, D., Shelhamer, E., Liu, S., Olshausen, B.A., & Darrell, T. (2021). Tent: Fully test-time adaptation by entropy minimization. In International Conference on Learning Representations.
Wang, J., Sun, Y., & Tang, J. (2022). Lisiam: Localization invariance siamese network for deepfake detection. IEEE Transactions on Information Forensics and Security, 17, 2425–2436.
Wang, S., Wang, O., Zhang, R., Owens, A., & Efros, A.A. (2020). Cnn-generated images are surprisingly easy to spot... for now. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8692–8701.
Wang, Z., Zhang, Z., Lee, C., Zhang, H., Sun, R., Ren, X., Su, G., Perot, V., Dy, J.G., & Pfister, T. (2022b). Learning to prompt for continual learning. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 139–149.
Wang, Z., Bao, J., Zhou, W., Wang, W., & Li, H. (2023). Altfreezing for more general video face forgery detection. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4129–4138.
Woo, S., Park, J., Lee, J., & Kweon, I. S. (2018). CBAM: Convolutional block attention module. European Conference on Computer Vision, 11211, 3–19.
Yang, Y., Tan, Z., Tiwari, P., Pandey, H. M., Wan, J., Lei, Z., Guo, G., & Li, S. Z. (2021). Cascaded split-and-aggregate learning with feature recombination for pedestrian attribute recognition. International Journal of Computer Vision, 129(10), 2731–2744.
Yao, Y., Zhang, A., Zhang, Z., Liu, Z., Chua, T., & Sun, M. (2021). CPT: Colorful prompt tuning for pre-trained vision-language models. arxiv: abs/2109.11797
Zhang, M., Marklund, H., Dhawan, N., Gupta, A., Levine, S., & Finn, C. (2021). Adaptive risk minimization: Learning to adapt to domain shift. In Advances in Neural Information Processing Systems, pp 23664–23678.
Zhang, Q., & Chen, Y. (2023). Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations.
Zhang, R., Zhang, W., Fang, R., Gao, P., Li, K., Dai, J., Qiao, Y., & Li, H. (2022). Tip-adapter: Training-free adaption of CLIP for few-shot classification. European Conference on Computer Vision, 13695, 493–510.
Zhao, H., Zhou, W., Chen, D., Wei, T., Zhang, W., & Yu, N. (2021). Multi-attentional deepfake detection. In IEEE Conference on Computer Vision and Pattern Recognition, pp 2185–2194.
Zhou, K., Yang, J., Loy, C.C., & Liu, Z. (2022a). Conditional prompt learning for vision-language models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 16795–16804.
Zhou, K., Yang, J., Loy, C. C., & Liu, Z. (2022). Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9), 2337–2348.
Zhu, X., Fei, H., Zhang, B., Zhang, T., Zhang, X., Li, S. Z., & Lei, Z. (2023). Face forgery detection by 3d decomposition and composition search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7), 8342–8357.
Zi, B., Chang, M., Chen, J., Ma, X., & Jiang, Y. (2020). Wilddeepfake: A challenging real-world dataset for deepfake detection. In The 28th ACM International Conference on Multimedia, pp 2382–2390.
Acknowledgements
This work is partially funded by the National Natural Science Foundation of China (Grant Nos. U21B2045, U20A20223, 32341009, 62206277), Youth Innovation Promotion Association CAS (Grant No. 2022132), and Beijing Nova Program (20230484276).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Segio Escalera.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Duan, J., Ai, Y., Liu, J. et al. Test-time Forgery Detection with Spatial-Frequency Prompt Learning. Int J Comput Vis 133, 672–687 (2025). https://doi.org/10.1007/s11263-024-02208-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-024-02208-2