Beta-Tuned Timestep Diffusion Model

Zheng, Tianyi; Jiang, Peng-Tao; Wan, Ben; Zhang, Hao; Chen, Jinwei; Wang, Jia; Li, Bo

doi:10.1007/978-3-031-72646-0_7

Tianyi Zheng ORCID: orcid.org/0009-0007-5270-4746^13,14,
Peng-Tao Jiang¹⁴,
Ben Wan¹³,
Hao Zhang¹⁴,
Jinwei Chen¹⁴,
Jia Wang¹³ &
…
Bo Li¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15061))

Included in the following conference series:

European Conference on Computer Vision

456 Accesses

Abstract

Diffusion models have received a lot of attention in the field of generation due to their ability to produce high-quality samples. However, several recent studies indicate that treating all distributions equally in diffusion model training is sub-optimal. In this paper, we conduct an in-depth theoretical analysis of the forward process of diffusion models. Our findings reveal that the distribution variations are non-uniform throughout the diffusion process and the most drastic variations in distribution occur in the initial stages. Consequently, simple uniform timestep sampling strategy fail to align with these properties, potentially leading to sub-optimal training of diffusion models. To address this, we propose the Beta-Tuned Timestep Diffusion Model (B-TTDM), which devises a timestep sampling strategy based on the beta distribution. By choosing the correct parameters, B-TTDM aligns the timestep sampling distribution with the properties of the forward diffusion process. Extensive experiments on different benchmark datasets validate the effectiveness of B-TTDM.

This work was done during Tianyi Zheng’s internship at vivo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Mixture of Efficient Diffusion Experts Through Automatic Interval and Sub-network Selection

Memory-Efficient Fine-Tuning for Quantized Diffusion Model

On the Similarities Between Denoising Diffusion Models and Autoencoders

References

Anderson, B.D.: Reverse-time diffusion equation models. Stochastic Processes Appl. 12(3), 313–326 (1982)
Article MathSciNet Google Scholar
Bao, F., et al.: All are worth words: a vit backbone for diffusion models. In: CVPR, pp. 22669–22679. IEEE (2023)
Google Scholar
Block, A., Mroueh, Y., Rakhlin, A., Ross, J.: Fast mixing of multi-scale langevin dynamics under the manifold hypothesis. CoRR abs/2006.11166 (2020)
Google Scholar
Cai, R., et al.: Benchlmm: benchmarking cross-style visual capability of large multimodal models. In: ECCV (2024)
Google Scholar
Chen, Z., Li, B., Wu, S., Jiang, K., Ding, S., Zhang, W.: Content-based unrestricted adversarial attack. NeurIPS 36 (2024)
Google Scholar
Chen, Z., Li, B., Xu, J., Wu, S., Ding, S., Zhang, W.: Towards practical certifiable patch defense with vision transformer. In: CVPR, pp. 15148–15158 (2022)
Google Scholar
Choi, J., Lee, J., Shin, C., Kim, S., Kim, H., Yoon, S.: Perception prioritized training of diffusion models. In: CVPR, pp. 11462–11471. IEEE (2022)
Google Scholar
Choi, Y., Uh, Y., Yoo, J., Ha, J.: Stargan v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194. Computer Vision Foundation/IEEE (2020)
Google Scholar
Dhariwal, P., Nichol, A.Q.: Diffusion models beat gans on image synthesis. In: NeurIPS. pp. 8780–8794 (2021)
Google Scholar
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR. OpenReview.net (2017)
Google Scholar
Fan, K., et al.: Freemotion: a unified framework for number-free text-to-motion synthesis. ECCV (2024)
Google Scholar
Fang, B., Li, B., Wu, S., Yi, R., Ding, S., Ma, L.: Re-thinking data availablity attacks against deep neural networks. CVPR (2024)
Google Scholar
Fang, B., et al.: Towards generalizable data protection with transferable unlearnable examples. CoRR abs/2305.11191 (2023)
Google Scholar
Ge, X., et al.: Difffas: face anti-spoofing via generative diffusion models. In: ECCV (2024)
Google Scholar
Geng, C., et al.: Improving adversarial energy-based model via diffusion process. ICML (2024)
Google Scholar
Geng, C., Wang, J., Gao, Z., Frellsen, J., Hauberg, S.: Bounds all around: training energy-based models with bidirectional bounds. NeurIPS (2021)
Google Scholar
Go, H., Lee, Y., Lee, S., Oh, S., Moon, H., Choi, S.: Addressing negative transfer in diffusion models. NeurIPS 36 (2024)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Google Scholar
Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7407–7417. IEEE (2023)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS, pp. 6626–6637 (2017)
Google Scholar
Ho, J., et al.: Imagen video: high definition video generation with diffusion models. CoRR abs/2210.02303 (2022)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Google Scholar
Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005)
MathSciNet Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017)
Google Scholar
Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. In: NeurIPS (2022)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. PAMI 43(12), 4217–4228 (2021)
Article Google Scholar
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: NeurIPS, pp. 10236–10245 (2018)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Google Scholar
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. In: NeurIPS, pp. 3929–3938 (2019)
Google Scholar
Lee, H., Lu, J., Tan, Y.: Convergence for score-based generative modeling with polynomial complexity. In: NeurIPS (2022)
Google Scholar
Li, X., Thickstun, J., Gulrajani, I., Liang, P., Hashimoto, T.B.: Diffusion-lm improves controllable text generation. In: NeurIPS (2022)
Google Scholar
Liu, Y., Chen, Y., Dai, W., Gou, M., Huang, C.T., Xiong, H.: Source-free domain adaptation with domain generalized pretraining for face anti-spoofing. PAMI (2024)
Google Scholar
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738. IEEE Computer Society (2015)
Google Scholar
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: NeurIPS (2022)
Google Scholar
Meng, C., et al.: Sdedit: guided image synthesis and editing with stochastic differential equations. In: ICLR (2022)
Google Scholar
Nash, C., Menick, J., Dieleman, S., Battaglia, P.W.: Generating images with sparse representations. In: ICML, vol. 139, pp. 7958–7968. PMLR (2021)
Google Scholar
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: ICML, vol. 139, pp. 8162–8171. PMLR (2021)
Google Scholar
Ning, M., Li, M., Su, J., Salah, A.A., Ertugrul, I.Ö.: Elucidating the exposure bias in diffusion models. ICLR (2024)
Google Scholar
Ning, M., Sangineto, E., Porrello, A., Calderara, S., Cucchiara, R.: Input perturbation reduces exposure bias in diffusion models. In: ICML (2023)
Google Scholar
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.: Zero-shot image-to-image translation. In: SIGGRAPH (Conference Paper Track), pp. 11:1–11:11. ACM (2023)
Google Scholar
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. In: ICLR (2023)
Google Scholar
Risken, H., Risken, H.: Fokker-planck equation. Springer (1996)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10674–10685. IEEE (2022)
Google Scholar
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR. OpenReview.net (2021)
Google Scholar
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: NeurIPS, pp. 11895–11907 (2019)
Google Scholar
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR. OpenReview.net (2021)
Google Scholar
Tao, S., Wang, J.: Alleviation of gradient exploding in gans: fake can be real. In: CVPR, pp. 1188–1197. Computer Vision Foundation/IEEE (2020)
Google Scholar
Yang, Z., et al.: Eliminating lipschitz singularities in diffusion models. ICLR (2024)
Google Scholar
Yu, H., Shen, L., Huang, J., Zhou, M., Li, H., Zhao, F.: Debias the training of diffusion models. CoRR abs/2310.08442 (2023)
Google Scholar
Zeng, W., et al.: Infusion: preventing customized text-to-image diffusion from overfitting (2024)
Google Scholar
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhu, Z., et al.: Exploring discrete diffusion models for image captioning. CoRR abs/2211.11694 (2022)
Google Scholar

Download references

Acknowledgments

This work was supported in part by NSFC under Grant 61927809 and in part by STCSM under Grant 22DZ2229005.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Tianyi Zheng, Ben Wan & Jia Wang
vivo Mobile Communication Co., Ltd., Shanghai, China
Tianyi Zheng, Peng-Tao Jiang, Hao Zhang, Jinwei Chen & Bo Li

Authors

Tianyi Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Peng-Tao Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Ben Wan
View author publications
You can also search for this author in PubMed Google Scholar
Hao Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jinwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jia Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Jia Wang or Bo Li .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1776 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zheng, T. et al. (2025). Beta-Tuned Timestep Diffusion Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-72646-0_7
Published: 28 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics