Abstract
Given a factorization of an image into a sum of linear components, we present a zero-shot method to control each individual component through diffusion model sampling. For example, we can decompose an image into low and high spatial frequencies and condition these components on different text prompts. This produces hybrid images, which change appearance depending on viewing distance. By decomposing an image into three frequency subbands, we can generate hybrid images with three prompts. We also use a decomposition into grayscale and color components to produce images whose appearance changes when they are viewed in grayscale, a phenomena that naturally occurs under dim lighting. And we explore a decomposition by a motion blur kernel, which produces images that change appearance under motion blurring. Our method works by denoising with a composite noise estimate, built from the components of noise estimates conditioned on different prompts. We also show that for certain decompositions, our method recovers prior approaches to compositional generation and spatial control. Finally, we show that we can extend our approach to generate hybrid images from real images. We do this by holding one component fixed and generating the remaining components, effectively solving an inverse problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The update may also include adding random noise, \(\textbf{z}\sim \mathcal {N}(0,\textbf{I})\), in which case our analysis still holds with a modification to the argument, discussed in Appendix H.
References
AUTOMATIC1111: Negative prompt (2022). https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Negative-prompt. Accessed 7 Nov 2023
Avrahami, O., Lischinski, D., Fried, O.: Blended diffusion for text-driven editing of natural images. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18208–18218 (2022)
Bansal, A., et al.: Universal guidance for diffusion models (2023)
Bar-Tal, O., Yariv, L., Lipman, Y., Dekel, T.: Multidiffusion: fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113 (2023)
Brooks, T., Barron, J.T.: Learning to synthesize motion blur. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6840–6848 (2019)
Brooks, T., Holynski, A., Efros, A.A.: Instructpix2pix: learning to follow image editing instructions. In: CVPR (2023)
Burgert, R., Ranasinghe, K., Li, X., Ryoo, M.: Diffusion illusions: hiding images in plain sight. https://ryanndagreat.github.io/Diffusion-Illusions (2023)
Chandra, K., Li, T.M., Tenenbaum, J., Ragan-Kelley, J.: Designing perceptual puzzles by differentiating probabilistic programs. In: ACM SIGGRAPH 2022 Conference Proceedings, pp. 1–9 (2022)
Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: ILVR: conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938 (2021)
Chu, H.K., Hsu, W.H., Mitra, N.J., Cohen-Or, D., Wong, T.T., Lee, T.Y.: Camouflage images. ACM Trans. Graph. 29(4), 51–1 (2010)
Chung, H., Kim, J., Mccann, M.T., Klasky, M.L., Ye, J.C.: Diffusion posterior sampling for general noisy inverse problems. arXiv preprint arXiv:2209.14687 (2022)
Chung, H., Sim, B., Ryu, D., Ye, J.C.: Improving diffusion models for inverse problems using manifold constraints. Adv. Neural. Inf. Process. Syst. 35, 25683–25696 (2022)
Chung, H., Sim, B., Ye, J.C.: Come-closer-diffuse-faster: accelerating conditional diffusion models for inverse problems through stochastic contraction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12413–12422 (2022)
Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advance in Neural Information Processing System, vol. 34, pp. 8780–8794 (2021)
Du, Y., Li, S., Mordatch, I.: Compositional visual generation with energy based models. In: Advance in Neural Information Processing System, vol. 33, pp. 6637–6647 (2020)
Elsayed, G., et al.: Adversarial examples that fool both computer vision and time-limited humans. In: Advance in Neural Information Processing System, vol. 31 (2018)
Epstein, D., Jabri, A., Poole, B., Efros, A.A., Holynski, A.: Diffusion self-guidance for controllable image generation (2023)
Fergus, R., Singh, B., Hertzmann, A., Roweis, S.T., Freeman, W.T.: Removing camera shake from a single photograph. In: ACM Siggraph 2006 Papers, pp. 787–794 (2006)
Freeman, W.T., Adelson, E.H., Heeger, D.J.: Motion without movement. ACM Siggraph Computer Graphics 25(4), 27–30 (1991)
Geng, D., Owens, A.: Motion guidance: diffusion-based image editing with differentiable motion estimators. In: International Conference on Learning Representations (2024)
Geng, D., Park, I., Owens, A.: Visual anagrams: generating multi-view optical illusions with diffusion models. In: Computer Vision and Pattern Recognition (CVPR) 2024 (2024)
Gomez-Villa, A., Martin, A., Vazquez-Corral, J., Bertalmío, M.: Convolutional neural networks can be deceived by visual illusions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12309–12317 (2019)
Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014)
Gu, Z., Davis, A.: Filtered-guided diffusion: fast filter guidance for black-box diffusion models. arXiv preprint arXiv:2306.17141 (2023)
Guo, R., Collins, J., de Lima, O., Owens, A.: Ganmouflage: 3D object nondetection with texture fields. In: Computer Vision and Pattern Recognition (CVPR) (2023)
Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross attention control (2022)
Hertzmann, A.: Visual indeterminacy in gan art. In: ACM SIGGRAPH 2020 Art Gallery, pp. 424–428 (2020)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arxiv:2006.11239 (2020)
Ho, J., Salimans, T.: Classifier-free diffusion guidance (2022)
Huberman-Spiegelglas, I., Kulikov, V., Michaeli, T.: An edit friendly ddpm noise space: Inversion and manipulations. arXiv preprint arXiv:2304.06140 (2023)
Jaini, P., Clark, K., Geirhos, R.: Intriguing properties of generative classifiers. arXiv preprint arXiv:2309.16779 (2023)
Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advance in Neural Information Processing System, vol. 35, pp. 23593–23606 (2022)
Konstantinov, M., Shonenkov, A., Bakshandaeva, D., Ivanova, K.: If by deepfloyd lab at stabilityai (2023). https://github.com/deep-floyd/IF/, gitHub repository
Labs, M.: Controlnet qr code monster v2 for sd-1.5 (2023). https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster
Lee, Y., Kim, K., Kim, H., Sung, M.: Syncdiffusion: coherent montage via synchronized joint diffusions. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Liu, N., Li, S., Du, Y., Tenenbaum, J., Torralba, A.: Learning to compose visual relations. In: Advance in Neural Information Processing System, vol. 34, pp. 23166–23178 (2021)
Liu, N., Li, S., Du, Y., Torralba, A., Tenenbaum, J.B.: Compositional visual generation with composable diffusion models. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13677, pp. 423–439. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19790-1_26
Lugmayr, A., Danelljan, M., Romero, A., Yu, F., Timofte, R., Van Gool, L.: Repaint: inpainting using denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11461–11471 (2022)
Meng, C., He, Y., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: Guided image synthesis and editing with stochastic differential equations. In: International Conference on Learning Representations (2022)
Mildenhall, B., Barron, J.T., Chen, J., Sharlet, D., Ng, R., Carroll, R.: Burst denoising with kernel prediction networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2502–2510 (2018)
Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models (2022)
Nayar, S.K., Ben-Ezra, M.: Motion-based motion deblurring. IEEE Trans. Pattern Anal. Mach. Intell. 26(6), 689–698 (2004)
Ngo, J., Sankaranarayanan, S., Isola, P.: Is clip fooled by optical illusions? (2023)
Nichol, A., et al.: Towards photorealistic image generation and editing with text-guided diffusion models (2021)
Oliva, A., Schyns, P.G.: Coarse blobs or fine edges? Evidence that information diagnosticity changes the perception of complex visual stimuli. Cognitive psychology 34 (1997)
Oliva, A., Torralba, A., Schyns, P.G.: Hybrid images. ACM Trans. Graph. 25(3), 527–532 (2006). https://doi.org/10.1145/1141911.1141919
Owens, A., Barnes, C., Flint, A., Singh, H., Freeman, W.: Camouflaging an object from many viewpoints (2014)
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.Y.: Zero-shot image-to-image translation (2023)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. arXiv (2022)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2022), https://github.com/CompVis/latent-diffusionhttps://arxiv.org/abs/2112.10752
Ruiz, N., Li, Y., Jampani, V., Pritch, Y., Rubinstein, M., Aberman, K.: Dreambooth: fine tuning text-to-image diffusion models for subject-driven generation (2022)
Saharia, C., et al.: Photorealistic text-to-image diffusion models with deep language understanding (2022)
Schyns, P.G., Oliva, A.: From blobs to boundary edges: evidence for time-and spatial-scale-dependent scene recognition. Psychol. Sci. 5(4), 195–200 (1994)
Schyns, P.G., Oliva, A.: Dr. Angry and Mr. Smile: when categorization flexibly modifies the perception of faces in rapid visual presentations. Cognitive 69(3), 243–265 (1999)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: Bach, F., Blei, D. (eds.) Proceedings of the 32nd International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 37, pp. 2256–2265. PMLR, Lille, France (2015). https://proceedings.mlr.press/v37/sohl-dickstein15.html
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020), https://arxiv.org/abs/2010.02502
Song, Y., Shen, L., Xing, L., Ermon, S.: Solving inverse problems in medical imaging with score-based generative models. arXiv preprint arXiv:2111.08005 (2021)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (2021), https://openreview.net/forum?id=PxTIG12RRHS
Sripian, P., Yamaguchi, Y.: Hybrid image of three contents. Visual computing for industry, biomedicine, and art 3(1), 1–8 (2020)
Svantner, P.: City skyline under blue sky during daytime (2024). https://unsplash.com/photos/city-skyline-under-blue-sky-during-daytime-bM6LEOXfjP8
Szegedy, C., et al.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
Takeda, H., Milanfar, P.: Removing motion blur with space-time processing. IEEE Trans. Image Process. 20(10), 2990–3000 (2011)
Tancik, M.: Illusion diffusion (2023). https://github.com/tancik/Illusion-Diffusion
Tumanyan, N., Geyer, M., Bagon, S., Dekel, T.: Plug-and-play diffusion features for text-driven image-to-image translation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1921–1930 (2023)
Ugleh: Spiral town - different approach to QR monster (2023). https://www.reddit.com/r/StableDiffusion/comments/16ew9fz/spiral_town_different_approach_to_qr_monster/
Wallace, B., Gokul, A., Naik, N.: Edict: exact diffusion inversion via coupled transformations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22532–22541 (2023)
Wang, X., Bylinskii, Z., Hertzmann, A., Pepperell, R.: Toward quantifying ambiguities in artistic images. ACM Trans. Appl. Percept. (TAP) 17(4), 1–10 (2020)
Wang, X., et al.: Generative powers of ten. arXiv preprint arXiv:2312.02149 (2023)
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. arXiv preprint arXiv:2212.00490 (2022)
Wu, C.H., De la Torre, F.: Unifying diffusion models’ latent space, with applications to cyclediffusion and guidance. arXiv preprint arXiv:2210.05559 (2022)
Yitzhaky, Y., Mor, I., Lantzman, A., Kopeika, N.S.: Direct method for restoration of motion-blurred images. JOSA A 15(6), 1512–1519 (1998)
Zhang, L., Rao, A., Agrawala, M.: Adding conditional control to text-to-image diffusion models (2023)
Zhang, Q., Song, J., Huang, X., Chen, Y., Liu, M.Y.: Diffcollage: parallel generation of large content with diffusion models. arXiv preprint arXiv:2303.17076 (2023)
Acknowledgements
We thank Patrick Chao, Aleksander Holynski, Richard Zhang, Trenton Chang, Utkarsh Singhal, Huijie Zhang, Bowen Song, Jeongsoo Park, Jeong Joon Park, Jeffrey Fessler, Liyue Shen, Qing Qu, Antonio Torralba, and Alexei Efros for helpful discussions. We also thank Walter Scheirer, Luba Elliott, and Nicole Finn for reaching out and giving us the (amazing) opportunity to create an illusion for the CVPR 2024 T-shirt as part of the AI Art Gallery (see Appendix L). Daniel is supported by the National Science Foundation Graduate Research Fellowship under Grant No. 1841052.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Geng, D., Park, I., Owens, A. (2025). Factorized Diffusion: Perceptual Illusions by Noise Decomposition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15115. Springer, Cham. https://doi.org/10.1007/978-3-031-72998-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-72998-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72997-3
Online ISBN: 978-3-031-72998-0
eBook Packages: Computer ScienceComputer Science (R0)