Zero-Shot Detection of AI-Generated Images

Cozzolino, Davide; Poggi, Giovanni; Nießner, Matthias; Verdoliva, Luisa

doi:10.1007/978-3-031-72649-1_4

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15076))

Included in the following conference series:

European Conference on Computer Vision

837 Accesses

Abstract

Detecting AI-generated images has become an extraordinarily difficult challenge as new generative architectures emerge on a daily basis with more and more capabilities and unprecedented realism. New versions of many commercial tools, such as DALL$\cdot $E, Midjourney, and Stable Diffusion, have been released recently, and it is impractical to continually update and retrain supervised forensic detectors to handle such a large variety of models. To address this challenge, we propose a zero-shot entropy-based detector (ZED) that neither needs AI-generated training data nor relies on knowledge of generative architectures to artificially synthesize their artifacts. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. To this end, we rely on a lossless image encoder that estimates the probability distribution of each pixel given its context. To ensure computational efficiency, the encoder has a multi-resolution architecture and contexts comprise mostly pixels of the lower-resolution version of the image. Since only real images are needed to learn the model, the detector is independent of generator architectures and synthetic training data. Using a single discriminative feature, the proposed detector achieves state-of-the-art performance. On a wide variety of generative models it achieves an average improvement of more than 3% over the SoTA in terms of accuracy. Code is available at https://grip-unina.github.io/ZED/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Detecting Generated Images by Real Images

The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking

Conditioned Prompt-Optimization for Continual Deepfake Detection

Notes

1.
More precisely, all color components of all pixels, but to simplify notations, in the following we will neglect color and treat the image as if grayscale.

References

Albright, M., McCloskey, S.: Source Generator Attribution via Inversion. In: CVPR Workshop. pp. 96–103 (2019)
Google Scholar
Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Del Bimbo, A., Cucchiara, R.: Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images. ACM Trans. Multimedia Comput. Commun, Appl (2024)
Google Scholar
Bammey, Q.: Synthbuster: Towards Detection of Diffusion Model Generated Images. IEEE Open Journal of Signal Processing (2023)
Google Scholar
Boháček, M., Farid, H.: A geometric and photometric exploration of GAN and Diffusion synthesized faces. In: CVPR Workshop. pp. 874—883 (2023)
Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity Natural Image Synthesis. In: ICLR (2018)
Google Scholar
Cao, S., Wu, C.Y., Krähenbühl, P.: Lossless Image Compression through Super-Resolution. arXiv preprint arXiv:2004.02872v1 (2020)
Chai, L., Bau, D., Lim, S.N., Isola, P.: What Makes Fake Images Detectable? Understanding Properties that Generalize. In: ECCV. pp. 103–120 (2020)
Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR. pp. 8789–8797 (2018)
Google Scholar
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: CVPR Workshop. pp. 973–982 (2023)
Google Scholar
Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP. pp. 1–5 (2023)
Google Scholar
Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., Verdoliva, L.: Raising the Bar of AI-generated Image Detection with CLIP. In: CVPR Workshop. pp. 4356–4366 (2024)
Google Scholar
Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., Verdoliva, L.: Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510 (2018)
Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: RAISE: A Raw Images Dataset for Digital Image Forensics. In: ACM MMSys. p. 219-224 (2015)
Google Scholar
Dayma, B., Patil, S., Cuenca, P., Saifullah, K., Abraham, T., Lê Khàc, P., Melas, L., Ghosh, R.: DALL-E Mini (2021). https://doi.org/10.5281/zenodo.5146400, https://github.com/borisdayma/dalle-mini
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR. pp. 248–255 (2009)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. NeurIPS 34, 8780–8794 (2021)
Google Scholar
Du, M., Pentyala, S., Li, Y., Hu, X.: Towards Generalizable Deepfake Detection with Locality-Aware AutoEncoder. In: CIKM. pp. 325—334 (2020)
Google Scholar
Durall, R., Keuper, M., Keuper, J.: Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions. In: CVPR. pp. 7890–7899 (2020)
Google Scholar
Epstein, D.C., Jain, I., Wang, O., Zhang, R.: Online Detection of AI-Generated Images. In: ICCV Workshop. pp. 382–392 (2023)
Google Scholar
Epstein, Z., Hertzmann, A., Herman, L., Mahari, R., Frank, M.R., Groh, M., Schroeder, H., Akten, A.S.M., Fjeld, J., Farid, H., Leach, N., Pentland, A.S., Russakovsky, O.: Art and the science of generative AI: A deeper dive. arXiv preprint arXiv:2306.04141 (2023)
Farid, H.: Lighting (in) consistency of paint by text. arXiv preprint arXiv:2207.13744 (2022)
Farid, H.: Perspective (in) consistency of paint by text. arXiv preprint arXiv:2206.14617 (2022)
Firefly, A.: https://www.adobe.com/sensei/generative-ai/firefly.html (2023)
Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Leveraging Frequency Analysis for Deep Fake Image Recognition. In: ICML. pp. 3247–3258 (2020)
Google Scholar
Gehrmann, S., Strobelt, H., Rush, A.M.: GLTR: Statistical detection and visualization of generated text. In: 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 111–116 (2019)
Google Scholar
Ghosal, S.S., Chakraborty, S., Geiping, J., Huang, F., Manocha, D., Bedi, A.S.: Towards possibilities & impossibilities of AI-generated text detection: A survey. arXiv preprint arXiv:2310.15264 (2023)
Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In: ICME. pp. 1–6 (2021)
Google Scholar
Grommelt, P., Weiss, L., Pfreundt, F.J., Keuper, J.: Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets. arXiv preprint arXiv:2403.17608 (2024)
Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., Goldstein, T.: Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. In: ICML (2024)
Google Scholar
He, Z., Chen, P.Y., Ho, T.Y.: RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection. arXiv preprint arXiv:2405.20112 (2024)
Heikkilä, M.: This artist is dominating AI-generated art. and he’s not happy about it. MIT Technology Review (2022)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)
Google Scholar
Jeon, H., Bang, Y.O., Kim, J., Woo, S.: T-GD: Transferable GAN-generated Images Detection Framework. In: ICML. vol. 119, pp. 4746–4761 (2020)
Google Scholar
Jeong, Y., Kim, D., Ro, Y., Kim, P., Choi, J.: FingerprintNet: Synthesized Fingerprints for Generated Image Detection. In: ECCV. pp. 76–94 (2022)
Google Scholar
Kang, M., Zhu, J.Y., Zhang, R., Park, J., Shechtman, E., Paris, S., Park, T.: Scaling up gans for text-to-image synthesis. In: CVPR. pp. 10124–10134 (2023)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. In: ICLR (2018)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR. pp. 4401–4410 (2019)
Google Scholar
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR. pp. 8110–8119 (2020)
Google Scholar
Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Schuhmann, C., Ivanova, K., Klokova, N.: https://www.deepfloyd.ai/deepfloyd-if (2023)
Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., et al.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github.com/openimages (2017)
Lin, L., Gupta, N., Zhang, Y., Ren, H., Liu, C.H., Ding, F., Wang, X., Li, X., Verdoliva, L., Hu, S.: Detecting multimedia generated by large ai models: A survey. arXiv preprint arXiv:2204.06125 (2024)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: ECCV. pp. 740–755 (2014)
Google Scholar
Liu, B., Yang, F., Bi, X., Xiao, B., Li, W., Gao, X.: Detecting generated images by real images. In: ECCV. pp. 95–110 (2022)
Google Scholar
Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection. In: CVPR. pp. 10770–10780 (2024)
Google Scholar
Mahajan, S., Roth, S.: PixelPyramids: Exact Inference Models from Lossless Image Pyramids. In: ICCV. pp. 6639–6648 (2021)
Google Scholar
Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting GAN-generated Images by Orthogonal Training of Multiple CNNs. In: ICIP. pp. 3091–3095 (2022)
Google Scholar
Marra, F., Saltori, C., Boato, G., Verdoliva, L.: Incremental learning for the detection and classification of GAN-generated images. In: WIFS. pp. 1–6 (2019)
Google Scholar
Midjourney: https://www.midjourney.com/home (2023)
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. In: ICML. pp. 24950–24962 (2023)
Google Scholar
Nichol, A.Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., Sutskever, I., Chen, M.: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diff. Models. In: ICML. pp. 16784–16804 (2022)
Google Scholar
Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that generalize across generative models. In: CVPR. pp. 24480–24489 (2023)
Google Scholar
OpenAI: https://openai.com/dall-e-3 (2023)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR. pp. 2337–2346 (2019)
Google Scholar
Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: ICCV. pp. 4195–4205 (2023)
Google Scholar
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024)
Google Scholar
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML. pp. 8748–8763 (2021)
Google Scholar
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125 (2022)
Reed, S.E., van den Oord, A., Kalchbrenner, N., Colmenarejo, S.G., Wang, Z., Chen, Y., Belov, D., de Freitas, N.: Parallel multiscale autoregressive density estimation. In: ICML. pp. 2912–2921 (2017)
Google Scholar
Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes. In: VISAPP. pp. 446–457 (2024)
Google Scholar
Ricker, J., Lukovnikov, D., Fischer, A.: AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error. In: CVPR. pp. 9130–9140 (2024)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/CompVis/stable-diffusion (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/Stability-AI/stablediffusion (2022)
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images. In: ICCV. pp. 1–11 (2019)
Google Scholar
Sarkar, A., Mai, H., Mahapatra, A., Lazebnik, S., Forsyth, D.A., Bhattad, A.: Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry... for now. In: CVPR. pp. 28140–28149 (2024)
Google Scholar
Schuhmann, C., Kaczmarczyk, R., Komatsuzaki, A., Katta, A., Vencu, R., Beaumont, R., Jitsev, J., Coombes, T., Mullis, C.: LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. In: NeurIPS (2021)
Google Scholar
Sha, Z., Li, Z., Yu, N., Zhang, Y.: DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. In: ACM SIGSAC. pp. 3418–3432 (2023)
Google Scholar
Sinitsa, S., Fried, O.: Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis. In: WACV. pp. 4067–4076 (2024)
Google Scholar
Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., Radford, A., Krueger, G., Kim, J.W., Kreps, S., et al.: Release Strategies and the Social Impacts of Language Models. arXiv preprint arXiv:1908.09203 (2019)
Su, J., Zhuo, T.Y., Wang, D., Nakov, P.: DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. In: Conference on Empirical Methods in Natural Language Processing (2023)
Google Scholar
Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection. In: CVPR. pp. 28130–28139 (2024)
Google Scholar
Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection. In: CVPR. pp. 12105–12114 (2023)
Google Scholar
Tao, M., Bao, B.K., Tang, H., Xu, C.: Galip: Generative adversarial clips for text-to-image synthesis. In: CVPR. pp. 14214–14223 (2023)
Google Scholar
Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-generated images are surprisingly easy to spot... for now. In: CVPR. pp. 8692–8701 (2020)
Google Scholar
Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: DIRE for Diffusion-Generated Image Detection. ICCV pp. 22445–22455 (2023)
Google Scholar
Wang, Z., Zheng, H., He, P., Chen, W., Zhou, M.: Diffusion-GAN: Training GANs with Diffusion. In: ICLR (2023)
Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Zhang, X., Karaman, S., Chang, S.F.: Detecting and Simulating Artifacts in GAN Fake Images. In: WIFS. pp. 1–6 (2019)
Google Scholar
Zhong, N., Xu, Y., Qian, Z., Zhang, X.: Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection. arXiv preprint arXiv:2311.12397v1 (2023)

Download references

Acknowledgments

We gratefully acknowledge the support of this research by a TUM-IAS Hans Fischer Senior Fellowship, the ERC Starting Grant Scan2CAD (804724), and a Google Gift. This material is also based on research sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under agreement number FA8750-20-2-1004. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government. In addition, this work has received funding by the European Union under the Horizon Europe vera.ai project, Grant Agreement number 101070093.

Author information

Authors and Affiliations

University Federico II of Naples, 80125, Naples, Italy
Davide Cozzolino, Giovanni Poggi & Luisa Verdoliva
Technical University of Munich, 85748, Garching, Germany
Matthias Nießner & Luisa Verdoliva

Authors

Davide Cozzolino
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Poggi
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Nießner
View author publications
You can also search for this author in PubMed Google Scholar
Luisa Verdoliva
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Luisa Verdoliva .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 882 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cozzolino, D., Poggi, G., Nießner, M., Verdoliva, L. (2025). Zero-Shot Detection of AI-Generated Images. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-72649-1_4
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72648-4
Online ISBN: 978-3-031-72649-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Zero-Shot Detection of AI-Generated Images