Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Geng, Daniel; Park, Inbum; Owens, Andrew

doi:10.1007/978-3-031-72998-0_21

Daniel Geng¹³,
Inbum Park¹³ &
Andrew Owens¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15115))

Included in the following conference series:

European Conference on Computer Vision

282 Accesses
1 Citations

Abstract

Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Denoising by Inpainting

Directional Filters for Color Cartoon+Texture Image and Video Decomposition

Article 25 November 2015

Lazy Diffusion Transformer for Interactive Image Editing

Notes

1.
The update may also include adding random noise, $\textbf{z}\sim \mathcal {N}(0,\textbf{I})$, in which case our analysis still holds with a modification to the argument, discussed in Appendix H.

References

AUTOMATIC1111: Negative prompt (2022). https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt. Accessed 7 Nov 2023
Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)
Google Scholar
Bansal, A., et al.: Universal guidance for diffusion models (2023)
Google Scholar
Bar-Tal, O., Yariv, L., Lipman, Y., Dekel, T.: Multidiffusion: fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113 (2023)
Brooks, T., Barron, J.T.: Learning to synthesize motion blur. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6840–6848 (2019)
Google Scholar
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR (2023)
Google Scholar
Burgert, R., Ranasinghe, K., Li, X., Ryoo, M.: Diffusion illusions: hiding images in plain sight. https://ryanndagreat.github.io/Diffusion-Illusions (2023)
Chandra, K., Li, T.M., Tenenbaum, J., Ragan-Kelley, J.: Designing perceptual puzzles by differentiating probabilistic programs. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
Google Scholar
Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: ILVR: conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938 (2021)
Chu, H.K., Hsu, W.H., Mitra, N.J., Cohen-Or, D., Wong, T.T., Lee, T.Y.: Camouflage images. ACM Trans. Graph. 29(4), 51–1 (2010)
Article Google Scholar
Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687 (2022)
Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. Adv. Neural. Inf. Process. Syst. 35, 25683–25696 (2022)
Google Scholar
Chung, H., Sim, B., Ye, J.C.: Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12413–12422 (2022)
Google Scholar
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advance in Neural Information Processing System, vol. 34, pp. 8780–8794 (2021)
Google Scholar
Du, Y., Li, S., Mordatch, I.: Compositional visual generation with energy based models. In: Advance in Neural Information Processing System, vol. 33, pp. 6637–6647 (2020)
Google Scholar
Elsayed, G., et al.: Adversarial examples that fool both computer vision and time-limited humans. In: Advance in Neural Information Processing System, vol. 31 (2018)
Google Scholar
Epstein, D., Jabri, A., Poole, B., Efros, A.A., Holynski, A.: Diffusion self-guidance for controllable image generation (2023)
Google Scholar
Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM Siggraph 2006 Papers, pp. 787–794 (2006)
Google Scholar
Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. ACM Siggraph Computer Graphics 25(4), 27–30 (1991)
Article Google Scholar
Geng, D., Owens, A.: Motion guidance: diffusion-based image editing with differentiable motion estimators. In: International Conference on Learning Representations (2024)
Google Scholar
Geng, D., Park, I., Owens, A.: Visual anagrams: generating multi-view optical illusions with diffusion models. In: Computer Vision and Pattern Recognition (CVPR) 2024 (2024)
Google Scholar
Gomez-Villa, A., Martin, A., Vazquez-Corral, J., Bertalmío, M.: Convolutional neural networks can be deceived by visual illusions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12309–12317 (2019)
Google Scholar
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gu, Z., Davis, A.: Filtered-guided diffusion: fast filter guidance for black-box diffusion models. arXiv preprint arXiv:2306.17141 (2023)
Guo, R., Collins, J., de Lima, O., Owens, A.: Ganmouflage: 3D object nondetection with texture fields. In: Computer Vision and Pattern Recognition (CVPR) (2023)
Google Scholar
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control (2022)
Google Scholar
Hertzmann, A.: Visual indeterminacy in gan art. In: ACM SIGGRAPH 2020 Art Gallery, pp. 424–428 (2020)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)
Google Scholar
Huberman-Spiegelglas, I., Kulikov, V., Michaeli, T.: An edit friendly ddpm noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140 (2023)
Jaini, P., Clark, K., Geirhos, R.: Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779 (2023)
Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advance in Neural Information Processing System, vol. 35, pp. 23593–23606 (2022)
Google Scholar
Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Ivanova, K.: If by deepfloyd lab at stabilityai (2023). https://github.com/deep-floyd/IF/, gitHub repository
Labs, M.: Controlnet qr code monster v2 for sd-1.5 (2023). https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster
Lee, Y., Kim, K., Kim, H., Sung, M.: Syncdiffusion: coherent montage via synchronized joint diffusions. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Google Scholar
Liu, N., Li, S., Du, Y., Tenenbaum, J., Torralba, A.: Learning to compose visual relations. In: Advance in Neural Information Processing System, vol. 34, pp. 23166–23178 (2021)
Google Scholar
Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_26
Chapter Google Scholar
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471 (2022)
Google Scholar
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: Guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022)
Google Scholar
Mildenhall, B., Barron, J.T., Chen, J., Sharlet, D., Ng, R., Carroll, R.: Burst denoising with kernel prediction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2502–2510 (2018)
Google Scholar
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)
Google Scholar
Nayar, S.K., Ben-Ezra, M.: Motion-based motion deblurring. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 689–698 (2004)
Article Google Scholar
Ngo, J., Sankaranarayanan, S., Isola, P.: Is clip fooled by optical illusions? (2023)
Google Scholar
Nichol, A., et al.: Towards photorealistic image generation and editing with text-guided diffusion models (2021)
Google Scholar
Oliva, A., Schyns, P.G.: Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive psychology 34 (1997)
Google Scholar
Oliva, A., Torralba, A., Schyns, P.G.: Hybrid images. ACM Trans. Graph. 25(3), 527–532 (2006). https://doi.org/10.1145/1141911.1141919
Article Google Scholar
Owens, A., Barnes, C., Flint, A., Singh, H., Freeman, W.: Camouflaging an object from many viewpoints (2014)
Google Scholar
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. arXiv (2022)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022), https://github.com/CompVis/latent-diffusionhttps://arxiv.org/abs/2112.10752
Google Scholar
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation (2022)
Google Scholar
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022)
Google Scholar
Schyns, P.G., Oliva, A.: From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol. Sci. 5(4), 195–200 (1994)
Article Google Scholar
Schyns, P.G., Oliva, A.: Dr. Angry and Mr. Smile: when categorization flexibly modifies the perception of faces in rapid visual presentations. Cognitive 69(3), 243–265 (1999)
Google Scholar
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France (2015). https://proceedings.mlr.press/v37/sohl-dickstein15.html
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020), https://arxiv.org/abs/2010.02502
Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005 (2021)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=PxTIG12RRHS
Sripian, P., Yamaguchi, Y.: Hybrid image of three contents. Visual computing for industry, biomedicine, and art 3(1), 1–8 (2020)
Google Scholar
Svantner, P.: City skyline under blue sky during daytime (2024). https://unsplash.com/photos/city-skyline-under-blue-sky-during-daytime-bM6LEOXfjP8
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Takeda, H., Milanfar, P.: Removing motion blur with space-time processing. IEEE Trans. Image Process. 20(10), 2990–3000 (2011)
Article MathSciNet Google Scholar
Tancik, M.: Illusion diffusion (2023). https://github.com/tancik/Illusion-Diffusion
Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1921–1930 (2023)
Google Scholar
Ugleh: Spiral town - different approach to QR monster (2023). https://www.reddit.com/r/StableDiffusion/comments/16ew9fz/spiral_town_different_approach_to_qr_monster/
Wallace, B., Gokul, A., Naik, N.: Edict: exact diffusion inversion via coupled transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22532–22541 (2023)
Google Scholar
Wang, X., Bylinskii, Z., Hertzmann, A., Pepperell, R.: Toward quantifying ambiguities in artistic images. ACM Trans. Appl. Percept. (TAP) 17(4), 1–10 (2020)
Article Google Scholar
Wang, X., et al.: Generative powers of ten. arXiv preprint arXiv:2312.02149 (2023)
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)
Wu, C.H., De la Torre, F.: Unifying diffusion models’ latent space, with applications to cyclediffusion and guidance. arXiv preprint arXiv:2210.05559 (2022)
Yitzhaky, Y., Mor, I., Lantzman, A., Kopeika, N.S.: Direct method for restoration of motion-blurred images. JOSA A 15(6), 1512–1519 (1998)
Article Google Scholar
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
Google Scholar
Zhang, Q., Song, J., Huang, X., Chen, Y., Liu, M.Y.: Diffcollage: parallel generation of large content with diffusion models. arXiv preprint arXiv:2303.17076 (2023)

Download references

Acknowledgements

We thank Patrick Chao, Aleksander Holynski, Richard Zhang, Trenton Chang, Utkarsh Singhal, Huijie Zhang, Bowen Song, Jeongsoo Park, Jeong Joon Park, Jeffrey Fessler, Liyue Shen, Qing Qu, Antonio Torralba, and Alexei Efros for helpful discussions. We also thank Walter Scheirer, Luba Elliott, and Nicole Finn for reaching out and giving us the (amazing) opportunity to create an illusion for the CVPR 2024 T-shirt as part of the AI Art Gallery (see Appendix L). Daniel is supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1841052.

Author information

Authors and Affiliations

University of Michigan, Ann Arbor, USA
Daniel Geng, Inbum Park & Andrew Owens

Authors

Daniel Geng
View author publications
You can also search for this author in PubMed Google Scholar
Inbum Park
View author publications
You can also search for this author in PubMed Google Scholar
Andrew Owens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Daniel Geng or Inbum Park .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Hessen, Germany
Stefan Roth
Princeton University, Palo Alto, CA, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9527 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Geng, D., Park, I., Owens, A. (2025). Factorized Diffusion: Perceptual Illusions by Noise Decomposition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15115. Springer, Cham. https://doi.org/10.1007/978-3-031-72998-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-72998-0_21
Published: 30 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72997-3
Online ISBN: 978-3-031-72998-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Denoising by Inpainting

Directional Filters for Color Cartoon+Texture Image and Video Decomposition

Lazy Diffusion Transformer for Interactive Image Editing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 9527 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Denoising by Inpainting

Directional Filters for Color Cartoon+Texture Image and Video Decomposition

Lazy Diffusion Transformer for Interactive Image Editing

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

1 Electronic supplementary material

Supplementary material 1 (pdf 9527 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation