Skip to main content

Factorized Diffusion: Perceptual Illusions by Noise Decomposition

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15115))

Included in the following conference series:

Abstract

Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The update may also include adding random noise, \(\textbf{z}\sim \mathcal {N}(0,\textbf{I})\), in which case our analysis still holds with a modification to the argument, discussed in Appendix H.

References

  1. AUTOMATIC1111: Negative prompt (2022). https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt. Accessed 7 Nov 2023

  2. Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)

    Google Scholar 

  3. Bansal, A., et al.: Universal guidance for diffusion models (2023)

    Google Scholar 

  4. Bar-Tal, O., Yariv, L., Lipman, Y., Dekel, T.: Multidiffusion: fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113 (2023)

  5. Brooks, T., Barron, J.T.: Learning to synthesize motion blur. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6840–6848 (2019)

    Google Scholar 

  6. Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR (2023)

    Google Scholar 

  7. Burgert, R., Ranasinghe, K., Li, X., Ryoo, M.: Diffusion illusions: hiding images in plain sight. https://ryanndagreat.github.io/Diffusion-Illusions (2023)

  8. Chandra, K., Li, T.M., Tenenbaum, J., Ragan-Kelley, J.: Designing perceptual puzzles by differentiating probabilistic programs. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)

    Google Scholar 

  9. Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: ILVR: conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938 (2021)

  10. Chu, H.K., Hsu, W.H., Mitra, N.J., Cohen-Or, D., Wong, T.T., Lee, T.Y.: Camouflage images. ACM Trans. Graph. 29(4), 51–1 (2010)

    Article  Google Scholar 

  11. Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687 (2022)

  12. Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. Adv. Neural. Inf. Process. Syst. 35, 25683–25696 (2022)

    Google Scholar 

  13. Chung, H., Sim, B., Ye, J.C.: Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12413–12422 (2022)

    Google Scholar 

  14. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advance in Neural Information Processing System, vol. 34, pp. 8780–8794 (2021)

    Google Scholar 

  15. Du, Y., Li, S., Mordatch, I.: Compositional visual generation with energy based models. In: Advance in Neural Information Processing System, vol. 33, pp. 6637–6647 (2020)

    Google Scholar 

  16. Elsayed, G., et al.: Adversarial examples that fool both computer vision and time-limited humans. In: Advance in Neural Information Processing System, vol. 31 (2018)

    Google Scholar 

  17. Epstein, D., Jabri, A., Poole, B., Efros, A.A., Holynski, A.: Diffusion self-guidance for controllable image generation (2023)

    Google Scholar 

  18. Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM Siggraph 2006 Papers, pp. 787–794 (2006)

    Google Scholar 

  19. Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. ACM Siggraph Computer Graphics 25(4), 27–30 (1991)

    Article  Google Scholar 

  20. Geng, D., Owens, A.: Motion guidance: diffusion-based image editing with differentiable motion estimators. In: International Conference on Learning Representations (2024)

    Google Scholar 

  21. Geng, D., Park, I., Owens, A.: Visual anagrams: generating multi-view optical illusions with diffusion models. In: Computer Vision and Pattern Recognition (CVPR) 2024 (2024)

    Google Scholar 

  22. Gomez-Villa, A., Martin, A., Vazquez-Corral, J., Bertalmío, M.: Convolutional neural networks can be deceived by visual illusions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12309–12317 (2019)

    Google Scholar 

  23. Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)

  24. Gu, Z., Davis, A.: Filtered-guided diffusion: fast filter guidance for black-box diffusion models. arXiv preprint arXiv:2306.17141 (2023)

  25. Guo, R., Collins, J., de Lima, O., Owens, A.: Ganmouflage: 3D object nondetection with texture fields. In: Computer Vision and Pattern Recognition (CVPR) (2023)

    Google Scholar 

  26. Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control (2022)

    Google Scholar 

  27. Hertzmann, A.: Visual indeterminacy in gan art. In: ACM SIGGRAPH 2020 Art Gallery, pp. 424–428 (2020)

    Google Scholar 

  28. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)

  29. Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)

    Google Scholar 

  30. Huberman-Spiegelglas, I., Kulikov, V., Michaeli, T.: An edit friendly ddpm noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140 (2023)

  31. Jaini, P., Clark, K., Geirhos, R.: Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779 (2023)

  32. Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advance in Neural Information Processing System, vol. 35, pp. 23593–23606 (2022)

    Google Scholar 

  33. Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Ivanova, K.: If by deepfloyd lab at stabilityai (2023). https://github.com/deep-floyd/IF/, gitHub repository

  34. Labs, M.: Controlnet qr code monster v2 for sd-1.5 (2023). https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster

  35. Lee, Y., Kim, K., Kim, H., Sung, M.: Syncdiffusion: coherent montage via synchronized joint diffusions. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  36. Liu, N., Li, S., Du, Y., Tenenbaum, J., Torralba, A.: Learning to compose visual relations. In: Advance in Neural Information Processing System, vol. 34, pp. 23166–23178 (2021)

    Google Scholar 

  37. Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_26

    Chapter  Google Scholar 

  38. Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471 (2022)

    Google Scholar 

  39. Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: Guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022)

    Google Scholar 

  40. Mildenhall, B., Barron, J.T., Chen, J., Sharlet, D., Ng, R., Carroll, R.: Burst denoising with kernel prediction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2502–2510 (2018)

    Google Scholar 

  41. Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)

    Google Scholar 

  42. Nayar, S.K., Ben-Ezra, M.: Motion-based motion deblurring. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 689–698 (2004)

    Article  Google Scholar 

  43. Ngo, J., Sankaranarayanan, S., Isola, P.: Is clip fooled by optical illusions? (2023)

    Google Scholar 

  44. Nichol, A., et al.: Towards photorealistic image generation and editing with text-guided diffusion models (2021)

    Google Scholar 

  45. Oliva, A., Schyns, P.G.: Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive psychology 34 (1997)

    Google Scholar 

  46. Oliva, A., Torralba, A., Schyns, P.G.: Hybrid images. ACM Trans. Graph. 25(3), 527–532 (2006). https://doi.org/10.1145/1141911.1141919

    Article  Google Scholar 

  47. Owens, A., Barnes, C., Flint, A., Singh, H., Freeman, W.: Camouflaging an object from many viewpoints (2014)

    Google Scholar 

  48. Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)

    Google Scholar 

  49. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. arXiv (2022)

    Google Scholar 

  50. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022), https://github.com/CompVis/latent-diffusionhttps://arxiv.org/abs/2112.10752

    Google Scholar 

  51. Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation (2022)

    Google Scholar 

  52. Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022)

    Google Scholar 

  53. Schyns, P.G., Oliva, A.: From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol. Sci. 5(4), 195–200 (1994)

    Article  Google Scholar 

  54. Schyns, P.G., Oliva, A.: Dr. Angry and Mr. Smile: when categorization flexibly modifies the perception of faces in rapid visual presentations. Cognitive 69(3), 243–265 (1999)

    Google Scholar 

  55. Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France (2015). https://proceedings.mlr.press/v37/sohl-dickstein15.html

  56. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020), https://arxiv.org/abs/2010.02502

  57. Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005 (2021)

  58. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=PxTIG12RRHS

  59. Sripian, P., Yamaguchi, Y.: Hybrid image of three contents. Visual computing for industry, biomedicine, and art 3(1), 1–8 (2020)

    Google Scholar 

  60. Svantner, P.: City skyline under blue sky during daytime (2024). https://unsplash.com/photos/city-skyline-under-blue-sky-during-daytime-bM6LEOXfjP8

  61. Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)

  62. Takeda, H., Milanfar, P.: Removing motion blur with space-time processing. IEEE Trans. Image Process. 20(10), 2990–3000 (2011)

    Article  MathSciNet  Google Scholar 

  63. Tancik, M.: Illusion diffusion (2023). https://github.com/tancik/Illusion-Diffusion

  64. Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1921–1930 (2023)

    Google Scholar 

  65. Ugleh: Spiral town - different approach to QR monster (2023). https://www.reddit.com/r/StableDiffusion/comments/16ew9fz/spiral_town_different_approach_to_qr_monster/

  66. Wallace, B., Gokul, A., Naik, N.: Edict: exact diffusion inversion via coupled transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22532–22541 (2023)

    Google Scholar 

  67. Wang, X., Bylinskii, Z., Hertzmann, A., Pepperell, R.: Toward quantifying ambiguities in artistic images. ACM Trans. Appl. Percept. (TAP) 17(4), 1–10 (2020)

    Article  Google Scholar 

  68. Wang, X., et al.: Generative powers of ten. arXiv preprint arXiv:2312.02149 (2023)

  69. Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)

  70. Wu, C.H., De la Torre, F.: Unifying diffusion models’ latent space, with applications to cyclediffusion and guidance. arXiv preprint arXiv:2210.05559 (2022)

  71. Yitzhaky, Y., Mor, I., Lantzman, A., Kopeika, N.S.: Direct method for restoration of motion-blurred images. JOSA A 15(6), 1512–1519 (1998)

    Article  Google Scholar 

  72. Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)

    Google Scholar 

  73. Zhang, Q., Song, J., Huang, X., Chen, Y., Liu, M.Y.: Diffcollage: parallel generation of large content with diffusion models. arXiv preprint arXiv:2303.17076 (2023)

Download references

Acknowledgements

We thank Patrick Chao, Aleksander Holynski, Richard Zhang, Trenton Chang, Utkarsh Singhal, Huijie Zhang, Bowen Song, Jeongsoo Park, Jeong Joon Park, Jeffrey Fessler, Liyue Shen, Qing Qu, Antonio Torralba, and Alexei Efros for helpful discussions. We also thank Walter Scheirer, Luba Elliott, and Nicole Finn for reaching out and giving us the (amazing) opportunity to create an illusion for the CVPR 2024 T-shirt as part of the AI Art Gallery (see Appendix L). Daniel is supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1841052.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Daniel Geng or Inbum Park .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 9527 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Geng, D., Park, I., Owens, A. (2025). Factorized Diffusion: Perceptual Illusions by Noise Decomposition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15115. Springer, Cham. https://doi.org/10.1007/978-3-031-72998-0_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72998-0_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72997-3

  • Online ISBN: 978-3-031-72998-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics