Abstract
This paper explores leveraging representations of the data distribution learned by diffusion models to improve a downstream task of deepfake image detection. With the recent upsurge in the popularity of generative AI, it has become increasingly common to encounter disinformation in modalities such as language and text, images and speech. However, a significant portion of disinformative content can solely be attributed to deepfake images. Effective countermeasures in the past have relied upon classifying deepfake images based on spatial irregularities, inconsistencies in high frequency content and fingerprint matching with known residuals from popular deepfake generation models. However, as the technology behind deepfakes continues to advance, there is a growing need for robust detection methods and tools to ensure the integrity of visual information and mitigate the risks associated with the spread of misleading or malicious content. Thus, we investigate using diffusion-generated reconstructions and latent space inversion to enhance deepfake detection, adapting to the changing landscape of visual disinformation. We explore the feasibility of using diffusion generated reconstruction, diffusion generated latent space inversion and high frequency feature extraction for improving the performance of detecting deepfakes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Afchar, D., Nozick, V., Yamagishi, J., Echizen, I.: Mesonet: a compact facial video forgery detection network. In: 2018 IEEE international workshop on information forensics and security (WIFS). pp. 1–7. IEEE (2018)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1800–1807. IEEE Computer Society, Los Alamitos, CA, USA (jul 2017). https://doi.org/10.1109/CVPR.2017.195, https://doi.ieeecomputersociety.org/10.1109/CVPR.2017.195
Deepfakes: Faceswap: Deepfake your own images and videos. https://github.com/deepfakes/faceswap (2023)
Dhariwal, P., Nichol, A.Q.: Diffusion models beat GANs on image synthesis. In: Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W. (eds.) Advances in Neural Information Processing Systems (2021), https://openreview.net/forum?id=AAWuCvzaVt
Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Leveraging frequency analysis for deep fake image recognition. In: III, H.D., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 3247–3258. PMLR (13–18 Jul 2020), https://proceedings.mlr.press/v119/frank20a.html
Giudice, O., Guarnera, L., Battiato, S.: Fighting deepfakes by detecting gan dct anomalies. Journal of Imaging 7(8) (2021). https://doi.org/10.3390/jimaging7080128, https://www.mdpi.com/2313-433X/7/8/128
Gomes, R., Kamrowski, C., Langlois, J., Rozario, P., Dircks, I., Grottodden, K., Martinez, M., Tee, W.Z., Sargeant, K., LaFleur, C., Haley, M.: A comprehensive review of machine learning used to combat covid-19. Diagnostics 12(8) (2022). https://doi.org/10.3390/diagnostics12081853, https://www.mdpi.com/2075-4418/12/8/1853
Graham, M.S., Pinaya, W.H., Tudosiu, P.D., Nachev, P., Ourselin, S., Cardoso, J.: Denoising diffusion models for out-of-distribution detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops. pp. 2947–2956 (June 2023)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, Y., Liu, L., Liu, J., Wu, W., Zhou, H., Zhuang, B.: PTQD: Accurate post-training quantization for diffusion models. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S. (eds.) Advances in Neural Information Processing Systems. vol. 36, pp. 13237–13249. Curran Associates, Inc. (2023)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20, Curran Associates Inc., Red Hook, NY, USA (2020)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017)
Iqbal, F., Abbasi, A., Javed, A.R., Almadhor, A., Jalil, Z., Anwar, S., Rida, I.: Data augmentation-based novel deep learning method for deepfaked images detection. ACM Trans. Multimedia Comput. Commun. Appl. (apr 2023). https://doi.org/10.1145/3592615, https://doi.org/10.1145/3592615, just Accepted
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018), https://openreview.net/forum?id=Hk99zCeAb
Kowalski, M.: Faceswap: Face swapping and image manipulation tool. https://github.com/MarekKowalski/FaceSwap (2023)
Liu, L., Ren, Y., Lin, Z., Zhao, Z.: Pseudo numerical methods for diffusion models on manifolds. In: International Conference on Learning Representations (2022)
Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting gan-generated images by orthogonal training of multiple cnns. In: 2022 IEEE International Conference on Image Processing (ICIP). pp. 3091–3095 (2022). https://doi.org/10.1109/ICIP46576.2022.9897310
Marra, F., Gragnaniello, D., Cozzolino, D., Verdoliva, L.: Detection of gan-generated fake images over social networks. In: 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). pp. 384–389 (2018). https://doi.org/10.1109/MIPR.2018.00084
Marra, F., Gragnaniello, D., Verdoliva, L., Poggi, G.: Do gans leave artificial fingerprints? 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR) pp. 506–511 (2018), https://api.semanticscholar.org/CorpusID:57189570
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: Guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022), https://openreview.net/forum?id=aBsCjcPu_tE
Miyake, D., Iohara, A., Saito, Y., Tanaka, T.: Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models (2023), https://arxiv.org/abs/2305.16807
Nirkin, Y., Keller, Y., Hassner, T.: FSGAN: Subject agnostic face swapping and reenactment. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 7184–7193 (2019)
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023), https://arxiv.org/abs/2302.03027
Recht, B., Roelofs, R., Schmidt, L., Shankar, V.: Do ImageNet classifiers generalize to ImageNet? In: Chaudhuri, K., Salakhutdinov, R. (eds.) Proceedings of the 36th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 97, pp. 5389–5400. PMLR (09–15 Jun 2019), https://proceedings.mlr.press/v97/recht19a.html
Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes. ArXiv abs/2210.14571 (2022), https://api.semanticscholar.org/CorpusID:253116680
Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes (2023)
Rossler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 1–11 (2019)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=St1giarCHLP
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Thies, J., Zollhöfer, M., Nießner, M.: Deferred neural rendering: image synthesis using neural textures. ACM Trans. Graph. 38(4) (jul 2019). https://doi.org/10.1145/3306346.3323035, https://doi.org/10.1145/3306346.3323035
Thies, J., Zollhofer, M., Stamminger, M., Theobalt, C., Nießner, M.: Face2face: Real-time face capture and reenactment of rgb videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 2387–2395 (2016)
Van Baalen, M., Kuzmin, A., Nair, S.S., Ren, Y., Mahurin, E., Patel, C., Subramanian, S., Lee, S., Nagel, M., Soriaga, J., Blankevoort, T.: Fp8 versus int8 for efficient deep learning inference (2023), https://arxiv.org/abs/2303.17951
Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot...for now. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 8692–8701 (2020). https://doi.org/10.1109/CVPR42600.2020.00872
Wang, Z., Bao, J., gang Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion-generated image detection. 2023 IEEE/CVF International Conference on Computer Vision (ICCV) pp. 22388–22398 (2023), https://api.semanticscholar.org/CorpusID:257557819
Wesselkamp, V., Rieck, K., Arp, D., Quiring, E.: Misleading deep-fake detection with gan fingerprints. In: 2022 IEEE Security and Privacy Workshops (SPW). pp. 59–65. IEEE Computer Society, Los Alamitos, CA, USA (may 2022). https://doi.org/10.1109/SPW54247.2022.9833860
Yang, X., Zhou, D., Feng, J., Wang, X.: Diffusion probabilistic model made slim. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 22552–22562. IEEE Computer Society, Los Alamitos, CA, USA (jun 2023). https://doi.org/10.1109/CVPR52729.2023.02160, https://doi.ieeecomputersociety.org/10.1109/CVPR52729.2023.02160
Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. CoRR abs/1506.03365 (2015), http://arxiv.org/abs/1506.03365
Yu, N., Davis, L., Fritz, M.: Attributing fake images to gans: Learning and analyzing gan fingerprints. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 7555–7565. IEEE Computer Society, Los Alamitos, CA, USA (nov 2019). https://doi.org/10.1109/ICCV.2019.00765, https://doi.ieeecomputersociety.org/10.1109/ICCV.2019.00765
Zhang, X., Karaman, S., Chang, S.F.: Detecting and simulating artifacts in gan fake images. In: 2019 IEEE International Workshop on Information Forensics and Security (WIFS). pp. 1–6 (2019). https://doi.org/10.1109/WIFS47025.2019.9035107
Acknowledgment
We gratefully acknowledge the support of the Computer Research Institute of Montreal (CRIM), the Ministère de l’Économie, de l’Innovation et de l’Énergie (MEIE) of Quebec, and the Natural Sciences and Engineering Research Council of Canada (NSERC).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ganguly, R., Bah, M.D., Dahmane, M. (2025). Diffusion Models as a Representation Learner for Deepfake Image Detection. In: Antonacopoulos, A., Chaudhuri, S., Chellappa, R., Liu, CL., Bhattacharya, S., Pal, U. (eds) Pattern Recognition. ICPR 2024. Lecture Notes in Computer Science, vol 15321. Springer, Cham. https://doi.org/10.1007/978-3-031-78305-0_15
Download citation
DOI: https://doi.org/10.1007/978-3-031-78305-0_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78304-3
Online ISBN: 978-3-031-78305-0
eBook Packages: Computer ScienceComputer Science (R0)