research-article

Open access

Whodunit: Detection and Attribution of Synthetic Images by Leveraging Model-specific Fingerprints

Authors:

Alexander Wißmann,

Steffen Zeiler,

Robert M. Nickel,

Dorothea KolossaAuthors Info & Claims

MAD '24: Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation

Pages 65 - 72

https://doi.org/10.1145/3643491.3660280

Published: 10 June 2024 Publication History

All formats PDF

Abstract

With increasingly easier access to large, pre-trained text-to-image models, a surge of synthetic images, often visually indistinguishable from natural images, can be observed. Since naturalistic, synthetic images can be misidentified as natural, a general mistrust in visually conveyed information could be the result, especially considering misinformation potentially carried by synthetic images. The reverse case—misidentifying natural images as synthetic—may also contribute to this outcome. Detection and attribution of synthetic images can provide essential information about the source of an image, thus contributing to a realistic evaluation of its credibility.

In this work, several features, including the Power Spectral Density (PSD), Discrete Cosine Transform (DCT), and autocorrelation (ACF) are visually investigated before evaluating their merit as features in a neural network-based classifier, which is used for the detection and attribution of synthetic images, while especially focusing on the attribution of synthetic images to specific, differently fine-tuned versions of a pre-trained text-to-image model. Subjects of this investigation are portraits, generated by large, pre-trained, diffusion-based text-to-image models, due to their supreme potential for misuse and harm. Since this is the first work to consider attribution to differently fine-tuned versions of the same model architecture, a custom dataset is created, including images generated with Midjourney and three differently fine-tuned versions of the Stable Diffusion model.

Investigating the characteristics of synthetic images reveals a bias in the average ACF, which is not only distinct between different text-to-image model architectures, but also among differently fine-tuned versions of the same architecture. While this bias does not necessarily support the classification of individual images, both, the DCT and PSD prove to be well-suited for robust detection and attribution with high accuracy. Even attribution to differently fine-tuned diffusion models, if these are sufficiently different, as measured by Frèchet Inception Distance is, to an extent possible.

References

[1]

Alessandro Piva. 2013. An overview on image forensics. International Scholarly Research Notices 2013 (2013), 22 pages. https://doi.org/10.1155/2013/496701

Abstract

References

Cited By

Index Terms

Recommendations

Contrastive Learning Using Synthetic Images Generated from Real Images

Using synthetic images to register real images with surface models

Image sharpening detection based on multiresolution overshoot artifact analysis

Comments

Information

Published In

Sponsors

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

HTML Format

Login options

Full Access

Share

Share this Publication link

Share on social media

Affiliations