Abstract
Detecting AI-generated images has become an extraordinarily difficult challenge as new generative architectures emerge on a daily basis with more and more capabilities and unprecedented realism. New versions of many commercial tools, such as DALL\(\cdot \)E, Midjourney, and Stable Diffusion, have been released recently, and it is impractical to continually update and retrain supervised forensic detectors to handle such a large variety of models. To address this challenge, we propose a zero-shot entropy-based detector (ZED) that neither needs AI-generated training data nor relies on knowledge of generative architectures to artificially synthesize their artifacts. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. To this end, we rely on a lossless image encoder that estimates the probability distribution of each pixel given its context. To ensure computational efficiency, the encoder has a multi-resolution architecture and contexts comprise mostly pixels of the lower-resolution version of the image. Since only real images are needed to learn the model, the detector is independent of generator architectures and synthetic training data. Using a single discriminative feature, the proposed detector achieves state-of-the-art performance. On a wide variety of generative models it achieves an average improvement of more than 3% over the SoTA in terms of accuracy. Code is available at https://grip-unina.github.io/ZED/.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
More precisely, all color components of all pixels, but to simplify notations, in the following we will neglect color and treat the image as if grayscale.
References
Albright, M., McCloskey, S.: Source Generator Attribution via Inversion. In: CVPR Workshop. pp. 96–103 (2019)
Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Del Bimbo, A., Cucchiara, R.: Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images. ACM Trans. Multimedia Comput. Commun, Appl (2024)
Bammey, Q.: Synthbuster: Towards Detection of Diffusion Model Generated Images. IEEE Open Journal of Signal Processing (2023)
Boháček, M., Farid, H.: A geometric and photometric exploration of GAN and Diffusion synthesized faces. In: CVPR Workshop. pp. 874—883 (2023)
Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity Natural Image Synthesis. In: ICLR (2018)
Cao, S., Wu, C.Y., Krähenbühl, P.: Lossless Image Compression through Super-Resolution. arXiv preprint arXiv:2004.02872v1 (2020)
Chai, L., Bau, D., Lim, S.N., Isola, P.: What Makes Fake Images Detectable? Understanding Properties that Generalize. In: ECCV. pp. 103–120 (2020)
Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR. pp. 8789–8797 (2018)
Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: CVPR Workshop. pp. 973–982 (2023)
Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP. pp. 1–5 (2023)
Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., Verdoliva, L.: Raising the Bar of AI-generated Image Detection with CLIP. In: CVPR Workshop. pp. 4356–4366 (2024)
Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., Verdoliva, L.: Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510 (2018)
Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: RAISE: A Raw Images Dataset for Digital Image Forensics. In: ACM MMSys. p. 219-224 (2015)
Dayma, B., Patil, S., Cuenca, P., Saifullah, K., Abraham, T., Lê Khàc, P., Melas, L., Ghosh, R.: DALL-E Mini (2021). https://doi.org/10.5281/zenodo.5146400, https://github.com/borisdayma/dalle-mini
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR. pp. 248–255 (2009)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. NeurIPS 34, 8780–8794 (2021)
Du, M., Pentyala, S., Li, Y., Hu, X.: Towards Generalizable Deepfake Detection with Locality-Aware AutoEncoder. In: CIKM. pp. 325—334 (2020)
Durall, R., Keuper, M., Keuper, J.: Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions. In: CVPR. pp. 7890–7899 (2020)
Epstein, D.C., Jain, I., Wang, O., Zhang, R.: Online Detection of AI-Generated Images. In: ICCV Workshop. pp. 382–392 (2023)
Epstein, Z., Hertzmann, A., Herman, L., Mahari, R., Frank, M.R., Groh, M., Schroeder, H., Akten, A.S.M., Fjeld, J., Farid, H., Leach, N., Pentland, A.S., Russakovsky, O.: Art and the science of generative AI: A deeper dive. arXiv preprint arXiv:2306.04141 (2023)
Farid, H.: Lighting (in) consistency of paint by text. arXiv preprint arXiv:2207.13744 (2022)
Farid, H.: Perspective (in) consistency of paint by text. arXiv preprint arXiv:2206.14617 (2022)
Firefly, A.: https://www.adobe.com/sensei/generative-ai/firefly.html (2023)
Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Leveraging Frequency Analysis for Deep Fake Image Recognition. In: ICML. pp. 3247–3258 (2020)
Gehrmann, S., Strobelt, H., Rush, A.M.: GLTR: Statistical detection and visualization of generated text. In: 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 111–116 (2019)
Ghosal, S.S., Chakraborty, S., Geiping, J., Huang, F., Manocha, D., Bedi, A.S.: Towards possibilities & impossibilities of AI-generated text detection: A survey. arXiv preprint arXiv:2310.15264 (2023)
Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In: ICME. pp. 1–6 (2021)
Grommelt, P., Weiss, L., Pfreundt, F.J., Keuper, J.: Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets. arXiv preprint arXiv:2403.17608 (2024)
Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., Goldstein, T.: Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. In: ICML (2024)
He, Z., Chen, P.Y., Ho, T.Y.: RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection. arXiv preprint arXiv:2405.20112 (2024)
Heikkilä, M.: This artist is dominating AI-generated art. and he’s not happy about it. MIT Technology Review (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)
Jeon, H., Bang, Y.O., Kim, J., Woo, S.: T-GD: Transferable GAN-generated Images Detection Framework. In: ICML. vol. 119, pp. 4746–4761 (2020)
Jeong, Y., Kim, D., Ro, Y., Kim, P., Choi, J.: FingerprintNet: Synthesized Fingerprints for Generated Image Detection. In: ECCV. pp. 76–94 (2022)
Kang, M., Zhu, J.Y., Zhang, R., Park, J., Shechtman, E., Paris, S., Park, T.: Scaling up gans for text-to-image synthesis. In: CVPR. pp. 10124–10134 (2023)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. In: ICLR (2018)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR. pp. 4401–4410 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR. pp. 8110–8119 (2020)
Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Schuhmann, C., Ivanova, K., Klokova, N.: https://www.deepfloyd.ai/deepfloyd-if (2023)
Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., et al.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github.com/openimages (2017)
Lin, L., Gupta, N., Zhang, Y., Ren, H., Liu, C.H., Ding, F., Wang, X., Li, X., Verdoliva, L., Hu, S.: Detecting multimedia generated by large ai models: A survey. arXiv preprint arXiv:2204.06125 (2024)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: ECCV. pp. 740–755 (2014)
Liu, B., Yang, F., Bi, X., Xiao, B., Li, W., Gao, X.: Detecting generated images by real images. In: ECCV. pp. 95–110 (2022)
Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection. In: CVPR. pp. 10770–10780 (2024)
Mahajan, S., Roth, S.: PixelPyramids: Exact Inference Models from Lossless Image Pyramids. In: ICCV. pp. 6639–6648 (2021)
Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting GAN-generated Images by Orthogonal Training of Multiple CNNs. In: ICIP. pp. 3091–3095 (2022)
Marra, F., Saltori, C., Boato, G., Verdoliva, L.: Incremental learning for the detection and classification of GAN-generated images. In: WIFS. pp. 1–6 (2019)
Midjourney: https://www.midjourney.com/home (2023)
Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. In: ICML. pp. 24950–24962 (2023)
Nichol, A.Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., Sutskever, I., Chen, M.: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diff. Models. In: ICML. pp. 16784–16804 (2022)
Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that generalize across generative models. In: CVPR. pp. 24480–24489 (2023)
OpenAI: https://openai.com/dall-e-3 (2023)
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR. pp. 2337–2346 (2019)
Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: ICCV. pp. 4195–4205 (2023)
Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024)
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML. pp. 8748–8763 (2021)
Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125 (2022)
Reed, S.E., van den Oord, A., Kalchbrenner, N., Colmenarejo, S.G., Wang, Z., Chen, Y., Belov, D., de Freitas, N.: Parallel multiscale autoregressive density estimation. In: ICML. pp. 2912–2921 (2017)
Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes. In: VISAPP. pp. 446–457 (2024)
Ricker, J., Lukovnikov, D., Fischer, A.: AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error. In: CVPR. pp. 9130–9140 (2024)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/CompVis/stable-diffusion (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/Stability-AI/stablediffusion (2022)
Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images. In: ICCV. pp. 1–11 (2019)
Sarkar, A., Mai, H., Mahapatra, A., Lazebnik, S., Forsyth, D.A., Bhattad, A.: Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry... for now. In: CVPR. pp. 28140–28149 (2024)
Schuhmann, C., Kaczmarczyk, R., Komatsuzaki, A., Katta, A., Vencu, R., Beaumont, R., Jitsev, J., Coombes, T., Mullis, C.: LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. In: NeurIPS (2021)
Sha, Z., Li, Z., Yu, N., Zhang, Y.: DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. In: ACM SIGSAC. pp. 3418–3432 (2023)
Sinitsa, S., Fried, O.: Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis. In: WACV. pp. 4067–4076 (2024)
Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., Radford, A., Krueger, G., Kim, J.W., Kreps, S., et al.: Release Strategies and the Social Impacts of Language Models. arXiv preprint arXiv:1908.09203 (2019)
Su, J., Zhuo, T.Y., Wang, D., Nakov, P.: DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. In: Conference on Empirical Methods in Natural Language Processing (2023)
Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection. In: CVPR. pp. 28130–28139 (2024)
Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection. In: CVPR. pp. 12105–12114 (2023)
Tao, M., Bao, B.K., Tang, H., Xu, C.: Galip: Generative adversarial clips for text-to-image synthesis. In: CVPR. pp. 14214–14223 (2023)
Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-generated images are surprisingly easy to spot... for now. In: CVPR. pp. 8692–8701 (2020)
Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: DIRE for Diffusion-Generated Image Detection. ICCV pp. 22445–22455 (2023)
Wang, Z., Zheng, H., He, P., Chen, W., Zhou, M.: Diffusion-GAN: Training GANs with Diffusion. In: ICLR (2023)
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Zhang, X., Karaman, S., Chang, S.F.: Detecting and Simulating Artifacts in GAN Fake Images. In: WIFS. pp. 1–6 (2019)
Zhong, N., Xu, Y., Qian, Z., Zhang, X.: Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection. arXiv preprint arXiv:2311.12397v1 (2023)
Acknowledgments
We gratefully acknowledge the support of this research by a TUM-IAS Hans Fischer Senior Fellowship, the ERC Starting Grant Scan2CAD (804724), and a Google Gift. This material is also based on research sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under agreement number FA8750-20-2-1004. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government. In addition, this work has received funding by the European Union under the Horizon Europe vera.ai project, Grant Agreement number 101070093.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Cozzolino, D., Poggi, G., Nießner, M., Verdoliva, L. (2025). Zero-Shot Detection of AI-Generated Images. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-72649-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72648-4
Online ISBN: 978-3-031-72649-1
eBook Packages: Computer ScienceComputer Science (R0)