Skip to main content

Zero-Shot Detection of AI-Generated Images

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Detecting AI-generated images has become an extraordinarily difficult challenge as new generative architectures emerge on a daily basis with more and more capabilities and unprecedented realism. New versions of many commercial tools, such as DALL\(\cdot \)E, Midjourney, and Stable Diffusion, have been released recently, and it is impractical to continually update and retrain supervised forensic detectors to handle such a large variety of models. To address this challenge, we propose a zero-shot entropy-based detector (ZED) that neither needs AI-generated training data nor relies on knowledge of generative architectures to artificially synthesize their artifacts. Inspired by recent works on machine-generated text detection, our idea is to measure how surprising the image under analysis is compared to a model of real images. To this end, we rely on a lossless image encoder that estimates the probability distribution of each pixel given its context. To ensure computational efficiency, the encoder has a multi-resolution architecture and contexts comprise mostly pixels of the lower-resolution version of the image. Since only real images are needed to learn the model, the detector is independent of generator architectures and synthetic training data. Using a single discriminative feature, the proposed detector achieves state-of-the-art performance. On a wide variety of generative models it achieves an average improvement of more than 3% over the SoTA in terms of accuracy. Code is available at https://grip-unina.github.io/ZED/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    More precisely, all color components of all pixels, but to simplify notations, in the following we will neglect color and treat the image as if grayscale.

References

  1. Albright, M., McCloskey, S.: Source Generator Attribution via Inversion. In: CVPR Workshop. pp. 96–103 (2019)

    Google Scholar 

  2. Amoroso, R., Morelli, D., Cornia, M., Baraldi, L., Del Bimbo, A., Cucchiara, R.: Parents and Children: Distinguishing Multimodal DeepFakes from Natural Images. ACM Trans. Multimedia Comput. Commun, Appl (2024)

    Google Scholar 

  3. Bammey, Q.: Synthbuster: Towards Detection of Diffusion Model Generated Images. IEEE Open Journal of Signal Processing (2023)

    Google Scholar 

  4. Boháček, M., Farid, H.: A geometric and photometric exploration of GAN and Diffusion synthesized faces. In: CVPR Workshop. pp. 874—883 (2023)

    Google Scholar 

  5. Brock, A., Donahue, J., Simonyan, K.: Large Scale GAN Training for High Fidelity Natural Image Synthesis. In: ICLR (2018)

    Google Scholar 

  6. Cao, S., Wu, C.Y., Krähenbühl, P.: Lossless Image Compression through Super-Resolution. arXiv preprint arXiv:2004.02872v1 (2020)

  7. Chai, L., Bau, D., Lim, S.N., Isola, P.: What Makes Fake Images Detectable? Understanding Properties that Generalize. In: ECCV. pp. 103–120 (2020)

    Google Scholar 

  8. Choi, Y., Choi, M., Kim, M., Ha, J.W., Kim, S., Choo, J.: StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. In: CVPR. pp. 8789–8797 (2018)

    Google Scholar 

  9. Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In: CVPR Workshop. pp. 973–982 (2023)

    Google Scholar 

  10. Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP. pp. 1–5 (2023)

    Google Scholar 

  11. Cozzolino, D., Poggi, G., Corvi, R., Nießner, M., Verdoliva, L.: Raising the Bar of AI-generated Image Detection with CLIP. In: CVPR Workshop. pp. 4356–4366 (2024)

    Google Scholar 

  12. Cozzolino, D., Thies, J., Rössler, A., Riess, C., Nießner, M., Verdoliva, L.: Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv preprint arXiv:1812.02510 (2018)

  13. Dang-Nguyen, D.T., Pasquini, C., Conotter, V., Boato, G.: RAISE: A Raw Images Dataset for Digital Image Forensics. In: ACM MMSys. p. 219-224 (2015)

    Google Scholar 

  14. Dayma, B., Patil, S., Cuenca, P., Saifullah, K., Abraham, T., Lê Khàc, P., Melas, L., Ghosh, R.: DALL-E Mini (2021). https://doi.org/10.5281/zenodo.5146400, https://github.com/borisdayma/dalle-mini

  15. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: A large-scale hierarchical image database. In: CVPR. pp. 248–255 (2009)

    Google Scholar 

  16. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. NeurIPS 34, 8780–8794 (2021)

    Google Scholar 

  17. Du, M., Pentyala, S., Li, Y., Hu, X.: Towards Generalizable Deepfake Detection with Locality-Aware AutoEncoder. In: CIKM. pp. 325—334 (2020)

    Google Scholar 

  18. Durall, R., Keuper, M., Keuper, J.: Watch Your Up-Convolution: CNN Based Generative Deep Neural Networks Are Failing to Reproduce Spectral Distributions. In: CVPR. pp. 7890–7899 (2020)

    Google Scholar 

  19. Epstein, D.C., Jain, I., Wang, O., Zhang, R.: Online Detection of AI-Generated Images. In: ICCV Workshop. pp. 382–392 (2023)

    Google Scholar 

  20. Epstein, Z., Hertzmann, A., Herman, L., Mahari, R., Frank, M.R., Groh, M., Schroeder, H., Akten, A.S.M., Fjeld, J., Farid, H., Leach, N., Pentland, A.S., Russakovsky, O.: Art and the science of generative AI: A deeper dive. arXiv preprint arXiv:2306.04141 (2023)

  21. Farid, H.: Lighting (in) consistency of paint by text. arXiv preprint arXiv:2207.13744 (2022)

  22. Farid, H.: Perspective (in) consistency of paint by text. arXiv preprint arXiv:2206.14617 (2022)

  23. Firefly, A.: https://www.adobe.com/sensei/generative-ai/firefly.html (2023)

  24. Frank, J., Eisenhofer, T., Schönherr, L., Fischer, A., Kolossa, D., Holz, T.: Leveraging Frequency Analysis for Deep Fake Image Recognition. In: ICML. pp. 3247–3258 (2020)

    Google Scholar 

  25. Gehrmann, S., Strobelt, H., Rush, A.M.: GLTR: Statistical detection and visualization of generated text. In: 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. pp. 111–116 (2019)

    Google Scholar 

  26. Ghosal, S.S., Chakraborty, S., Geiping, J., Huang, F., Manocha, D., Bedi, A.S.: Towards possibilities & impossibilities of AI-generated text detection: A survey. arXiv preprint arXiv:2310.15264 (2023)

  27. Gragnaniello, D., Cozzolino, D., Marra, F., Poggi, G., Verdoliva, L.: Are GAN generated images easy to detect? A critical analysis of the state-of-the-art. In: ICME. pp. 1–6 (2021)

    Google Scholar 

  28. Grommelt, P., Weiss, L., Pfreundt, F.J., Keuper, J.: Fake or JPEG? Revealing Common Biases in Generated Image Detection Datasets. arXiv preprint arXiv:2403.17608 (2024)

  29. Hans, A., Schwarzschild, A., Cherepanova, V., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., Goldstein, T.: Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text. In: ICML (2024)

    Google Scholar 

  30. He, Z., Chen, P.Y., Ho, T.Y.: RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection. arXiv preprint arXiv:2405.20112 (2024)

  31. Heikkilä, M.: This artist is dominating AI-generated art. and he’s not happy about it. MIT Technology Review (2022)

    Google Scholar 

  32. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. NeurIPS 33, 6840–6851 (2020)

    Google Scholar 

  33. Jeon, H., Bang, Y.O., Kim, J., Woo, S.: T-GD: Transferable GAN-generated Images Detection Framework. In: ICML. vol. 119, pp. 4746–4761 (2020)

    Google Scholar 

  34. Jeong, Y., Kim, D., Ro, Y., Kim, P., Choi, J.: FingerprintNet: Synthesized Fingerprints for Generated Image Detection. In: ECCV. pp. 76–94 (2022)

    Google Scholar 

  35. Kang, M., Zhu, J.Y., Zhang, R., Park, J., Shechtman, E., Paris, S., Park, T.: Scaling up gans for text-to-image synthesis. In: CVPR. pp. 10124–10134 (2023)

    Google Scholar 

  36. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive Growing of GANs for Improved Quality, Stability, and Variation. In: ICLR (2018)

    Google Scholar 

  37. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR. pp. 4401–4410 (2019)

    Google Scholar 

  38. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR. pp. 8110–8119 (2020)

    Google Scholar 

  39. Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Schuhmann, C., Ivanova, K., Klokova, N.: https://www.deepfloyd.ai/deepfloyd-if (2023)

  40. Krasin, I., Duerig, T., Alldrin, N., Ferrari, V., Abu-El-Haija, S., Kuznetsova, A., Rom, H., Uijlings, J., Popov, S., Veit, A., et al.: OpenImages: A public dataset for large-scale multi-label and multi-class image classification. Dataset available from https://github.com/openimages (2017)

  41. Lin, L., Gupta, N., Zhang, Y., Ren, H., Liu, C.H., Ding, F., Wang, X., Li, X., Verdoliva, L., Hu, S.: Detecting multimedia generated by large ai models: A survey. arXiv preprint arXiv:2204.06125 (2024)

  42. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: Common objects in context. In: ECCV. pp. 740–755 (2014)

    Google Scholar 

  43. Liu, B., Yang, F., Bi, X., Xiao, B., Li, W., Gao, X.: Detecting generated images by real images. In: ECCV. pp. 95–110 (2022)

    Google Scholar 

  44. Liu, H., Tan, Z., Tan, C., Wei, Y., Wang, J., Zhao, Y.: Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection. In: CVPR. pp. 10770–10780 (2024)

    Google Scholar 

  45. Mahajan, S., Roth, S.: PixelPyramids: Exact Inference Models from Lossless Image Pyramids. In: ICCV. pp. 6639–6648 (2021)

    Google Scholar 

  46. Mandelli, S., Bonettini, N., Bestagini, P., Tubaro, S.: Detecting GAN-generated Images by Orthogonal Training of Multiple CNNs. In: ICIP. pp. 3091–3095 (2022)

    Google Scholar 

  47. Marra, F., Saltori, C., Boato, G., Verdoliva, L.: Incremental learning for the detection and classification of GAN-generated images. In: WIFS. pp. 1–6 (2019)

    Google Scholar 

  48. Midjourney: https://www.midjourney.com/home (2023)

  49. Mitchell, E., Lee, Y., Khazatsky, A., Manning, C.D., Finn, C.: DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature. In: ICML. pp. 24950–24962 (2023)

    Google Scholar 

  50. Nichol, A.Q., Dhariwal, P., Ramesh, A., Shyam, P., Mishkin, P., Mcgrew, B., Sutskever, I., Chen, M.: GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diff. Models. In: ICML. pp. 16784–16804 (2022)

    Google Scholar 

  51. Ojha, U., Li, Y., Lee, Y.J.: Towards universal fake image detectors that generalize across generative models. In: CVPR. pp. 24480–24489 (2023)

    Google Scholar 

  52. OpenAI: https://openai.com/dall-e-3 (2023)

  53. Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR. pp. 2337–2346 (2019)

    Google Scholar 

  54. Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: ICCV. pp. 4195–4205 (2023)

    Google Scholar 

  55. Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: SDXL: Improving latent diffusion models for high-resolution image synthesis. In: ICLR (2024)

    Google Scholar 

  56. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: ICML. pp. 8748–8763 (2021)

    Google Scholar 

  57. Ramesh, A., Dhariwal, P., Nichol, A., Chu, C., Chen, M.: Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv preprint arXiv:2204.06125 (2022)

  58. Reed, S.E., van den Oord, A., Kalchbrenner, N., Colmenarejo, S.G., Wang, Z., Chen, Y., Belov, D., de Freitas, N.: Parallel multiscale autoregressive density estimation. In: ICML. pp. 2912–2921 (2017)

    Google Scholar 

  59. Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes. In: VISAPP. pp. 446–457 (2024)

    Google Scholar 

  60. Ricker, J., Lukovnikov, D., Fischer, A.: AEROBLADE: Training-Free Detection of Latent Diffusion Images Using Autoencoder Reconstruction Error. In: CVPR. pp. 9130–9140 (2024)

    Google Scholar 

  61. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (2022)

    Google Scholar 

  62. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/CompVis/stable-diffusion (2022)

  63. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: https://github.com/Stability-AI/stablediffusion (2022)

  64. Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Nießner, M.: Faceforensics++: Learning to detect manipulated facial images. In: ICCV. pp. 1–11 (2019)

    Google Scholar 

  65. Sarkar, A., Mai, H., Mahapatra, A., Lazebnik, S., Forsyth, D.A., Bhattad, A.: Shadows Don’t Lie and Lines Can’t Bend! Generative Models don’t know Projective Geometry... for now. In: CVPR. pp. 28140–28149 (2024)

    Google Scholar 

  66. Schuhmann, C., Kaczmarczyk, R., Komatsuzaki, A., Katta, A., Vencu, R., Beaumont, R., Jitsev, J., Coombes, T., Mullis, C.: LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs. In: NeurIPS (2021)

    Google Scholar 

  67. Sha, Z., Li, Z., Yu, N., Zhang, Y.: DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. In: ACM SIGSAC. pp. 3418–3432 (2023)

    Google Scholar 

  68. Sinitsa, S., Fried, O.: Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis. In: WACV. pp. 4067–4076 (2024)

    Google Scholar 

  69. Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., Radford, A., Krueger, G., Kim, J.W., Kreps, S., et al.: Release Strategies and the Social Impacts of Language Models. arXiv preprint arXiv:1908.09203 (2019)

  70. Su, J., Zhuo, T.Y., Wang, D., Nakov, P.: DetectLLM: Leveraging Log Rank Information for Zero-Shot Detection of Machine-Generated Text. In: Conference on Empirical Methods in Natural Language Processing (2023)

    Google Scholar 

  71. Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the Up-Sampling Operations in CNN-based Generative Network for Generalizable Deepfake Detection. In: CVPR. pp. 28130–28139 (2024)

    Google Scholar 

  72. Tan, C., Zhao, Y., Wei, S., Gu, G., Wei, Y.: Learning on Gradients: Generalized Artifacts Representation for GAN-Generated Images Detection. In: CVPR. pp. 12105–12114 (2023)

    Google Scholar 

  73. Tao, M., Bao, B.K., Tang, H., Xu, C.: Galip: Generative adversarial clips for text-to-image synthesis. In: CVPR. pp. 14214–14223 (2023)

    Google Scholar 

  74. Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: CNN-generated images are surprisingly easy to spot... for now. In: CVPR. pp. 8692–8701 (2020)

    Google Scholar 

  75. Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: DIRE for Diffusion-Generated Image Detection. ICCV pp. 22445–22455 (2023)

    Google Scholar 

  76. Wang, Z., Zheng, H., He, P., Chen, W., Zhou, M.: Diffusion-GAN: Training GANs with Diffusion. In: ICLR (2023)

    Google Scholar 

  77. Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  78. Zhang, X., Karaman, S., Chang, S.F.: Detecting and Simulating Artifacts in GAN Fake Images. In: WIFS. pp. 1–6 (2019)

    Google Scholar 

  79. Zhong, N., Xu, Y., Qian, Z., Zhang, X.: Rich and Poor Texture Contrast: A Simple yet Effective Approach for AI-generated Image Detection. arXiv preprint arXiv:2311.12397v1 (2023)

Download references

Acknowledgments

We gratefully acknowledge the support of this research by a TUM-IAS Hans Fischer Senior Fellowship, the ERC Starting Grant Scan2CAD (804724), and a Google Gift. This material is also based on research sponsored by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL) under agreement number FA8750-20-2-1004. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of DARPA or the U.S. Government. In addition, this work has received funding by the European Union under the Horizon Europe vera.ai project, Grant Agreement number 101070093.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Luisa Verdoliva .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 882 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cozzolino, D., Poggi, G., Nießner, M., Verdoliva, L. (2025). Zero-Shot Detection of AI-Generated Images. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72649-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72648-4

  • Online ISBN: 978-3-031-72649-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics