Multistage guidance on the diffusion model inspired by human artists’ creative thinking

Qi, Wang; Deng, Huanghuang; Li, Taihao

doi:10.1631/FITEE.2300313

Multistage guidance on the diffusion model inspired by human artists’ creative thinking

受艺术家创造性思维启发的扩散模型多阶段引导

Correspondence
Published: 27 December 2023

Volume 25, pages 170–178, (2024)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Wang Qi (齐旺)¹^na1,
Huanghuang Deng (邓晃煌)²^na1 &
Taihao Li (李太豪) ORCID: orcid.org/0000-0003-3279-7125¹

231 Accesses
4 Altmetric
1 Mention
Explore all metrics

摘要

目前文本生成图像的研究已显示出与普通画家类似的水平,但与艺术家绘画水平相比仍有很大改进空间;艺术家水平的绘画通常将多个意象的特征融合到一个意象中,以表示多层次语义信息。在预实验中,我们证实了这一点,并咨询了3个具有不同艺术欣赏能力的群体的意见,以确定画家和艺术家之间绘画水平的区别。之后,利用这些观点帮助人工智能绘画系统从普通画家水平的图像生成改进为艺术家水平的图像生成。具体来说,提出一种无需任何进一步预训练的、基于文本的多阶段引导方法,帮助扩散模型在生成的图像中向多层次语义表示迈进。实验中的机器和人工评估都验证了所提方法的有效性。此外,与之前单阶段引导方法不同,该方法能够通过控制不同阶段之间的指导步数来控制各个意象特征在绘画中的表现程度。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Arjovsky M, Chintala S, Bottou L, 2017. Wasserstein GAN. https://arxiv.org/abs/1701.07875
Brock A, Donahue J, Simonyan K, 2019. Large scale GAN training for high fidelity natural image synthesis. Proc 7^th Int Conf on Learning Representations.
Chen M, Radford A, Child R, et al., 2020. Generative pretraining from pixels. Proc 37^th Int Conf on Machine Learning, p.1691–1703.
Chen N, Zhang Y, Zen H, et al., 2021. WaveGrad: estimating gradients for waveform generation. Proc 9^th Int Conf on Learning Representations.
Child R, Gray S, Radford A, et al., 2019. Generating long sequences with sparse transformers. https://arxiv.org/abs/1904.10509
Dinh L, Krueger D, Bengio Y, 2015. NICE: non-linear independent components estimation. Proc 3^rd Int Conf on Learning Representations.
Dinh L, Sohl-Dickstein J, Bengio S, 2017. Density estimation using real NVP. Proc 5^th Int Conf on Learning Representations.
Goodfellow I, Pouget-Abadie J, Mirza M, et al., 2020. Generative adversarial networks. Commun ACM, 63(11):139–144. https://doi.org/10.1145/3422622
Article MathSciNet Google Scholar
Gulrajani I, Ahmed F, Arjovsky M, et al., 2017. Improved training of wasserstein GANs. Proc 31^st Int Conf on Neural Information Processing Systems, p.5767–5777.
Ho J, Salimans T, 2021. Classifier-free diffusion guidance. Proc Workshop on Deep Generative Models and Downstream Applications.
Ho J, Jain A, Abbeel P, 2020. Denoising diffusion probabilistic models. Proc 34^th Int Conf on Neural Information Processing Systems, Article 574.
Karras T, Laine S, Aittala M, et al., 2020. Analyzing and improving the image quality of StyleGAN. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8107–8116. https://doi.org/10.1109/CVPR42600.2020.00813
Karras T, Laine S, Aila T, 2021. A style-based generator architecture for generative adversarial networks. IEEE Trans Patt Anal Mach Intell, 43(12):4217–4228. https://doi.org/10.1109/TPAMI.2020.2970919
Article Google Scholar
Kingma DP, Welling M, 2014. Auto-encoding variational Bayes. Proc 2^nd Int Conf on Learning Representations.
Kingma DP, Salimans T, Poole B, et al., 2021. Variational diffusion models. https://arxiv.org/abs/2107.00630
Kong ZF, Ping W, Huang JJ, et al., 2021. DiffWave: a versatile diffusion model for audio synthesis. Proc 9^th Int Conf on Learning Representations.
Mescheder L, 2018. On the convergence properties of GAN training. https://arxiv.org/abs/1801.04406v1
Metz L, Poole B, Pfau D, et al., 2017. Unrolled generative adversarial networks. Proc 5^th Int Conf on Learning Representations.
Mittal G, Engel JH, Hawthorne C, et al., 2021. Symbolic music generation with diffusion models. Proc 22^nd Int Society for Music Information Retrieval Conf, p.468–475.
Nichol AQ, Dhariwal P, 2021. Improved denoising diffusion probabilistic models. Proc 38^th Int Conf on Machine Learning, p.8162–8171.
Nichol AQ, Dhariwal P, Ramesh A, et al., 2022. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. Proc 39^th Int Conf on Machine Learning, p.16784–16804.
Ramesh A, Pavlov M, Goh G, et al., 2021. Zero-shot text-to-image generation. Proc 38^th Int Conf on Machine Learning, p.8821–8831.
Ramesh A, Dhariwal P, Nichol A, et al., 2022. Hierarchical text-conditional image generation with clip latents. https://arxiv.org/abs/2204.06125
Razavi A, van den Oord A, Vinyals O, 2019. Generating diverse high-fidelity images with VQ-VAE-2. Proc 33^rd Int Conf on Neural Information Processing Systems, Article 1331.
Rombach R, Blattmann A, Lorenz D, et al., 2022. High-resolution image synthesis with latent diffusion models. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.10684–10695. https://doi.org/10.1109/CVPR52688.2022.01042
Saharia C, Chan W, Saxena S, et al., 2022. Photorealistic text-to-image diffusion models with deep language understanding. Proc 36^th Int Conf on Neural Information Processing Systems, p.36479–36494.
Sohl-Dickstein J, Weiss EA, Maheswaranathan N, et al., 2015. Deep unsupervised learning using nonequilibrium thermodynamics. Proc 32^nd Int Conf on Machine Learning, p.2256–2265.
Song J, Meng C, Ermon S, 2021. Denoising diffusion implicit models. Proc 9^th Int Conf on Learning Representations.
Song Y, Sohl-Dickstein J, Kingma DP, et al., 2021. Score-based generative modeling through stochastic differential equations. Proc 9^th Int Conf on Learning Representations.
van den Oord A, Kalchbrenner N, Espeholt L, et al., 2016a. Conditional image generation with pixelcnn decoders. Proc 30^th Int Conf on Neural Information Processing Systems, p.4797–4805.
van den Oord A, Kalchbrenner N, Kavukcuoglu K, 2016b. Pixel recurrent neural networks. Proc 33^rd Int Conf on Machine Learning, p.1747–1756.
Vaswani A, Shazeer N, Parmar N, et al., 2017. Attention is all you need. Proc 31^st Int Conf on Neural Information Processing Systems, p.6000–6010.

Download references

Author information

These two authors contributed equally to this work

Authors and Affiliations

AI Research Institute, Zhejiang Lab, Hangzhou, 311121, China
Wang Qi (齐旺) & Taihao Li (李太豪)
Department of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Huanghuang Deng (邓晃煌)

Authors

Wang Qi (齐旺)
View author publications
Search author on:PubMed Google Scholar
Huanghuang Deng (邓晃煌)
View author publications
Search author on:PubMed Google Scholar
Taihao Li (李太豪)
View author publications
Search author on:PubMed Google Scholar

Contributions

Taihao LI designed the research. Wang QI and Huanghuang DENG developed the methodology, collected the data, and worked on the software. Wang QI drafted the paper. Huanghuang DENG helped organize the paper. All the authors revised and finalized the paper.

Corresponding author

Correspondence to Taihao Li (李太豪).

Ethics declarations

Wang QI, Huanghuang DENG, and Taihao LI declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qi, W., Deng, H. & Li, T. Multistage guidance on the diffusion model inspired by human artists’ creative thinking. Front Inform Technol Electron Eng 25, 170–178 (2024). https://doi.org/10.1631/FITEE.2300313

Download citation

Received: 30 April 2023
Accepted: 13 October 2023
Published: 27 December 2023
Issue Date: January 2024
DOI: https://doi.org/10.1631/FITEE.2300313

关键词

Part of a collection:

FITEE Special Issue on Recent Advances in Artificial Intelligence Generated Content (AIGC)

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multistage guidance on the diffusion model inspired by human artists’ creative thinking

摘要

Access this article

Subscribe and save

Buy Now

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Rights and permissions

About this article

Cite this article

Share this article

关键词

Subscribe and save

Buy Now