skip to main content
10.1145/3643491.3660280acmconferencesArticle/Chapter ViewAbstractPublication PagesicmrConference Proceedingsconference-collections
research-article
Open access

Whodunit: Detection and Attribution of Synthetic Images by Leveraging Model-specific Fingerprints

Published: 10 June 2024 Publication History

Abstract

With increasingly easier access to large, pre-trained text-to-image models, a surge of synthetic images, often visually indistinguishable from natural images, can be observed. Since naturalistic, synthetic images can be misidentified as natural, a general mistrust in visually conveyed information could be the result, especially considering misinformation potentially carried by synthetic images. The reverse case—misidentifying natural images as synthetic—may also contribute to this outcome. Detection and attribution of synthetic images can provide essential information about the source of an image, thus contributing to a realistic evaluation of its credibility.
In this work, several features, including the Power Spectral Density (PSD), Discrete Cosine Transform (DCT), and autocorrelation (ACF) are visually investigated before evaluating their merit as features in a neural network-based classifier, which is used for the detection and attribution of synthetic images, while especially focusing on the attribution of synthetic images to specific, differently fine-tuned versions of a pre-trained text-to-image model. Subjects of this investigation are portraits, generated by large, pre-trained, diffusion-based text-to-image models, due to their supreme potential for misuse and harm. Since this is the first work to consider attribution to differently fine-tuned versions of the same model architecture, a custom dataset is created, including images generated with Midjourney and three differently fine-tuned versions of the Stable Diffusion model.
Investigating the characteristics of synthetic images reveals a bias in the average ACF, which is not only distinct between different text-to-image model architectures, but also among differently fine-tuned versions of the same architecture. While this bias does not necessarily support the classification of individual images, both, the DCT and PSD prove to be well-suited for robust detection and attribution with high accuracy. Even attribution to differently fine-tuned diffusion models, if these are sufficiently different, as measured by Frèchet Inception Distance is, to an extent possible.

References

[1]
Alessandro Piva. 2013. An overview on image forensics. International Scholarly Research Notices 2013 (2013), 22 pages. https://doi.org/10.1155/2013/496701
[2]
Alex, Krizhevsky, Ilya, Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 84–90. https://doi.org/10.1145/3065386
[3]
Ali Borji. 2023. Generated Faces in the Wild: Quantitative Comparison of Stable Diffusion, Midjourney and DALL-E 2. https://doi.org/10.48550/arXiv.2210.00586
[4]
Antonio Torralba and Aude Oliva. 2003. Statistics of natural image categories. Network: computation in neural systems 14, 3 (2003), 391.
[5]
Belhassen Bayar and Matthew C Stamm. 2016. A deep learning approach to universal image manipulation detection using a new convolutional layer. In Proceedings of the 4th ACM workshop on information hiding and multimedia security. Association for Computing Machinery, New York, NY, USA, 5–10. https://doi.org/10.1145/2909827.2930786
[6]
Benj Edwards. 2023. Immaculate AI images of Pope Francis trick the masses. ArsTechnica. Retrieved March 15, 2024 from https://arstechnica.com/information-technology/2023/the-power-of-ai-compels-you-to-believe-this-fake-image-of-pope-in-a-puffy-coat/
[7]
R. Corvi, D. Cozzolino, G. Poggi, K. Nagano, and L. Verdoliva. 2023. Intriguing properties of synthetic images: from generative adversarial networks to diffusion models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE Computer Society, Los Alamitos, CA, USA, 973–982. https://doi.org/10.1109/CVPRW59228.2023.00104
[8]
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. 2023. On The Detection of Synthetic Images Generated by Diffusion Models. In ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York City, U.S., 1–5. https://doi.org/10.1109/ICASSP49357.2023.10095167
[9]
Prafulla Dhariwal and Alexander Nichol. 2021. Diffusion models beat gans on image synthesis. Advances in neural information processing systems 34 (2021), 8780–8794. https://doi.org/10.48550/arXiv.2105.05233
[10]
Tarik Dzanic, Karan Shah, and Freddie Witherden. 2020. Fourier spectrum discrepancies in deep network generated images. Advances in neural information processing systems 33 (2020), 3022–3032. https://doi.org/10.5555/3495724.3495978
[11]
Joel Frank, Thorsten Eisenhofer, Lea Schönherr, Asja Fischer, Dorothea Kolossa, and Thorsten Holz. 2020. Leveraging frequency analysis for deep fake image recognition. In Proceedings of the 37th International Conference on Machine Learning(ICML’20). JMLR.org, San Diego, CA, U.S., Article 304, 12 pages. https://doi.org/10.5555/3524938.3525242
[12]
Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, and Maneesh Agrawala. 2019. Text-based editing of talking-head video. ACM Transactions on Graphics (TOG) 38, 4 (2019), 1–14. https://doi.org/10.1145/3306346.3323028
[13]
G. Bradski. 2000. The OpenCV Library. Technical Report. Dr. Dobb’s Journal of Software Tools.
[14]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Commun. ACM 63, 11 (oct 2020), 139–144. https://doi.org/10.1145/3422622
[15]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30 (2017), 6629–6640. https://doi.org/10.5555/3295222.3295408
[16]
Iulia Turc and Gaurav Nemade. 2022. Midjourney User Prompts and Generated Images (250k). Retrieved March 15, 2024 from https://www.kaggle.com/ds/2349267 Uploaded to Kaggle: June 6, 2022, Downloaded: September 6, 2023.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE Computer Society, Los Alamitos, CA, USA, 770–778. https://doi.org/10.1109/CVPR.2016.90
[18]
Karen Hao. 2019. The biggest threat of deepfakes isn’t the deepfakes themselves. MIT Technology Review. Retrieved March 15, 2024 from https://www.technologyreview.com/2019/10/10/132667/the-biggest-threat-of-deepfakes-isnt-the-deepfakes-themselves/
[19]
Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung-Kyu Lee, and Changick Kim. 2022. Learning JPEG compression artifacts for image manipulation detection and localization. International Journal of Computer Vision 130, 8 (2022), 1875–1895. https://doi.org/10.1007/s11263-022-01617-5
[20]
Federica Lago, Cecilia Pasquini, Rainer Böhme, Hélène Dumont, Valérie Goffaux, and Giulia Boato. 2021. More real than real: A study on human visual perception of synthetic faces [applications corner]. IEEE Signal Processing Magazine 39, 1 (2021), 109–116. https://doi.org/10.1109/MSP.2021.3120982
[21]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In Computer Vision – ECCV 2014, David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars (Eds.). Springer International Publishing, Cham, 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
[22]
Maggie Harrison. 2023. Google’s Top Result for ”Johannes Vermeer“ Is an AI-Generated Version of ”Girl With a Pearl Earring“. Futurism. Retrieved March 15, 2024 from https://futurism.com/top-google-result-johannes-vermeer-ai-generated-knockoff
[23]
Maggie Harrison. 2023. Top Google Result for ”Edward Hopper“ an AI-Generated Fake. Futurism. Retrieved March 15, 2024 from https://futurism.com/top-google-result-edward-hopper-ai-generated-fake
[24]
Sara Mandelli, Davide Cozzolino, Edoardo D Cannas, Joao P Cardenuto, Daniel Moreira, Paolo Bestagini, Walter J Scheirer, Anderson Rocha, Luisa Verdoliva, Stefano Tubaro, 2022. Forensic analysis of synthetically generated western blot images. IEEE Access 10 (2022), 59919–59932. https://doi.org/10.1109/ACCESS.2022.3179116
[25]
F. Marra, D. Gragnaniello, L. Verdoliva, and G. Poggi. 2019. Do GANs Leave Artificial Fingerprints?. In 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR). IEEE Computer Society, Los Alamitos, CA, USA, 506–511. https://doi.org/10.1109/MIPR.2019.00103
[26]
Huaxiao Mo, Bolin Chen, and Weiqi Luo. 2018. Fake faces identification via convolutional neural network. In Proceedings of the 6th ACM workshop on information hiding and multimedia security. Association for Computing Machinery, New York, NY, USA, 43–47. https://doi.org/10.1145/3206004.3206009
[27]
Nicholas Thompson and Issie Lapowsky. 2017. How Russian Trolls Used Meme Warfare to Divide America. Wired. Retrieved March 15, 2024 from https://www.wired.com/story/russia-ira-propaganda-senate-report/
[28]
U. Ojha, Y. Li, and Y. Lee. 2023. Towards Universal Fake Image Detectors that Generalize Across Generative Models. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 24480–24489. https://doi.org/10.1109/CVPR52729.2023.02345
[29]
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32. Curran Associates Inc., Red Hook, NY, USA, 8024–8035. https://doi.org/10.5555/3454287.3455008
[30]
Paul Viola and Michael Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE computer society conference on computer vision and pattern recognition. CVPR 2001, Vol. 1. Ieee, IEEE Computer Society, Los Alamitos, CA, USA, I–I. https://doi.org/10.1109/CVPR.2001.990517
[31]
Jonas Ricker, Simon Damm, Thorsten Holz, and Asja Fischer. 2024. Towards the detection of diffusion model deepfakes. https://doi.org/10.48550/arXiv.2210.14571 arxiv:2210.14571 [cs.CV]
[32]
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. 2022. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, New York, U.S., 10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
[33]
Zeyang Sha, Zheng Li, Ning Yu, and Yang Zhang. 2023. DE-FAKE: Detection and Attribution of Fake Images Generated by Text-to-Image Generation Models. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security (Copenhagen, Denmark) (CCS ’23). Association for Computing Machinery, New York, NY, USA, 3418–3432. https://doi.org/10.1145/3576915.3616588
[34]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep Unsupervised Learning using Nonequilibrium Thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 37), Francis Bach and David Blei (Eds.). PMLR, Lille, France, 2256–2265. https://doi.org/10.48550/arXiv.1503.03585
[35]
Shahroz Tariq, Sangyup Lee, Hoyoung Kim, Youjin Shin, and Simon S Woo. 2019. Gan is a friend or foe? a framework to detect various fake face images. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. Association for Computing Machinery, New York, NY, USA, 1296–1303. https://doi.org/10.1145/3297280.3297410
[36]
Umesh P. 2012. Image Processing in Python. Technical Report. CSI Communications.
[37]
Rafael Valle, Wilson Cai, and Anish Doshi. 2018. TequilaGAN: How to easily identify GAN samples. https://doi.org/10.48550/arXiv.1807.04919 arXiv:arXiv preprint arXiv:1807.04919
[38]
Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser, Jonathan Bright, Stéfan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, Robert Kern, Eric Larson, C J Carey, İlhan Polat, Yu Feng, Eric W. Moore, Jake VanderPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen, E. A. Quintero, Charles R. Harris, Anne M. Archibald, Antônio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1.0 Contributors. 2020. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nature Methods 17 (2020), 261–272. https://doi.org/10.1038/s41592-019-0686-2
[39]
S. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros. 2020. CNN-Generated Images Are Surprisingly Easy to Spot… for Now. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 8692–8701. https://doi.org/10.1109/CVPR42600.2020.00872
[40]
Moritz Wolter, Felix Blanke, Raoul Heese, and Jochen Garcke. 2022. Wavelet-packets for deepfake image analysis and detection. Machine Learning 111, 11 (2022), 4295–4327. https://doi.org/10.1007/s10994-022-06225-5
[41]
N. Yu, L. Davis, and M. Fritz. 2019. Attributing Fake Images to GANs: Learning and Analyzing GAN Fingerprints. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 7555–7565. https://doi.org/10.1109/ICCV.2019.00765
[42]
Xu Zhang, Svebor Karaman, and Shih-Fu Chang. 2019. Detecting and simulating artifacts in gan fake images. In 2019 IEEE international workshop on information forensics and security (WIFS). IEEE, IEEE, New York, U.S., 1–6. https://doi.org/10.1109/WIFS47025.2019.9035107
[43]
P. Zhou, X. Han, V. I. Morariu, and L. S. Davis. 2018. Learning Rich Features for Image Manipulation Detection. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Computer Society, Los Alamitos, CA, USA, 1053–1061. https://doi.org/10.1109/CVPR.2018.00116
[44]
Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. 2015. Deep Learning Face Attributes in the Wild. In Proceedings of International Conference on Computer Vision (ICCV). IEEE Computer Society, Los Alamitos, CA, USA, 3730–3738. https://doi.org/10.1109/ICCV.2015.425

Cited By

View all
  • (2024)MAD '24 Workshop: Multimedia AI against DisinformationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3660000(1339-1341)Online publication date: 30-May-2024

Index Terms

  1. Whodunit: Detection and Attribution of Synthetic Images by Leveraging Model-specific Fingerprints

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      MAD '24: Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation
      June 2024
      107 pages
      ISBN:9798400705526
      DOI:10.1145/3643491
      This work is licensed under a Creative Commons Attribution International 4.0 License.

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2024

      Check for updates

      Author Tags

      1. Computer Vision
      2. Diffusion Models
      3. Disinformation
      4. Image Forensics
      5. Synthetic Images

      Qualifiers

      • Research-article
      • Research
      • Refereed limited

      Funding Sources

      • BMBF

      Conference

      ICMR '24
      Sponsor:

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)887
      • Downloads (Last 6 weeks)116
      Reflects downloads up to 20 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)MAD '24 Workshop: Multimedia AI against DisinformationProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3660000(1339-1341)Online publication date: 30-May-2024

      View Options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Login options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media