Skip to main content

Beta-Tuned Timestep Diffusion Model

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15061))

Included in the following conference series:

Abstract

Diffusion models have received a lot of attention in the field of generation due to their ability to produce high-quality samples. However, several recent studies indicate that treating all distributions equally in diffusion model training is sub-optimal. In this paper, we conduct an in-depth theoretical analysis of the forward process of diffusion models. Our findings reveal that the distribution variations are non-uniform throughout the diffusion process and the most drastic variations in distribution occur in the initial stages. Consequently, simple uniform timestep sampling strategy fail to align with these properties, potentially leading to sub-optimal training of diffusion models. To address this, we propose the Beta-Tuned Timestep Diffusion Model (B-TTDM), which devises a timestep sampling strategy based on the beta distribution. By choosing the correct parameters, B-TTDM aligns the timestep sampling distribution with the properties of the forward diffusion process. Extensive experiments on different benchmark datasets validate the effectiveness of B-TTDM.

This work was done during Tianyi Zheng’s internship at vivo.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Anderson, B.D.: Reverse-time diffusion equation models. Stochastic Processes Appl. 12(3), 313–326 (1982)

    Article  MathSciNet  Google Scholar 

  2. Bao, F., et al.: All are worth words: a vit backbone for diffusion models. In: CVPR, pp. 22669–22679. IEEE (2023)

    Google Scholar 

  3. Block, A., Mroueh, Y., Rakhlin, A., Ross, J.: Fast mixing of multi-scale langevin dynamics under the manifold hypothesis. CoRR abs/2006.11166 (2020)

    Google Scholar 

  4. Cai, R., et al.: Benchlmm: benchmarking cross-style visual capability of large multimodal models. In: ECCV (2024)

    Google Scholar 

  5. Chen, Z., Li, B., Wu, S., Jiang, K., Ding, S., Zhang, W.: Content-based unrestricted adversarial attack. NeurIPS 36 (2024)

    Google Scholar 

  6. Chen, Z., Li, B., Xu, J., Wu, S., Ding, S., Zhang, W.: Towards practical certifiable patch defense with vision transformer. In: CVPR, pp. 15148–15158 (2022)

    Google Scholar 

  7. Choi, J., Lee, J., Shin, C., Kim, S., Kim, H., Yoon, S.: Perception prioritized training of diffusion models. In: CVPR, pp. 11462–11471. IEEE (2022)

    Google Scholar 

  8. Choi, Y., Uh, Y., Yoo, J., Ha, J.: Stargan v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  9. Dhariwal, P., Nichol, A.Q.: Diffusion models beat gans on image synthesis. In: NeurIPS. pp. 8780–8794 (2021)

    Google Scholar 

  10. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR. OpenReview.net (2017)

    Google Scholar 

  11. Fan, K., et al.: Freemotion: a unified framework for number-free text-to-motion synthesis. ECCV (2024)

    Google Scholar 

  12. Fang, B., Li, B., Wu, S., Yi, R., Ding, S., Ma, L.: Re-thinking data availablity attacks against deep neural networks. CVPR (2024)

    Google Scholar 

  13. Fang, B., et al.: Towards generalizable data protection with transferable unlearnable examples. CoRR abs/2305.11191 (2023)

    Google Scholar 

  14. Ge, X., et al.: Difffas: face anti-spoofing via generative diffusion models. In: ECCV (2024)

    Google Scholar 

  15. Geng, C., et al.: Improving adversarial energy-based model via diffusion process. ICML (2024)

    Google Scholar 

  16. Geng, C., Wang, J., Gao, Z., Frellsen, J., Hauberg, S.: Bounds all around: training energy-based models with bidirectional bounds. NeurIPS (2021)

    Google Scholar 

  17. Go, H., Lee, Y., Lee, S., Oh, S., Moon, H., Choi, S.: Addressing negative transfer in diffusion models. NeurIPS 36 (2024)

    Google Scholar 

  18. Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  19. Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7407–7417. IEEE (2023)

    Google Scholar 

  20. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS, pp. 6626–6637 (2017)

    Google Scholar 

  21. Ho, J., et al.: Imagen video: high definition video generation with diffusion models. CoRR abs/2210.02303 (2022)

    Google Scholar 

  22. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

    Google Scholar 

  23. Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005)

    MathSciNet  Google Scholar 

  24. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017)

    Google Scholar 

  25. Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. In: NeurIPS (2022)

    Google Scholar 

  26. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. PAMI 43(12), 4217–4228 (2021)

    Article  Google Scholar 

  27. Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: NeurIPS, pp. 10236–10245 (2018)

    Google Scholar 

  28. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)

    Google Scholar 

  29. Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. In: NeurIPS, pp. 3929–3938 (2019)

    Google Scholar 

  30. Lee, H., Lu, J., Tan, Y.: Convergence for score-based generative modeling with polynomial complexity. In: NeurIPS (2022)

    Google Scholar 

  31. Li, X., Thickstun, J., Gulrajani, I., Liang, P., Hashimoto, T.B.: Diffusion-lm improves controllable text generation. In: NeurIPS (2022)

    Google Scholar 

  32. Liu, Y., Chen, Y., Dai, W., Gou, M., Huang, C.T., Xiong, H.: Source-free domain adaptation with domain generalized pretraining for face anti-spoofing. PAMI (2024)

    Google Scholar 

  33. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738. IEEE Computer Society (2015)

    Google Scholar 

  34. Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: NeurIPS (2022)

    Google Scholar 

  35. Meng, C., et al.: Sdedit: guided image synthesis and editing with stochastic differential equations. In: ICLR (2022)

    Google Scholar 

  36. Nash, C., Menick, J., Dieleman, S., Battaglia, P.W.: Generating images with sparse representations. In: ICML, vol. 139, pp. 7958–7968. PMLR (2021)

    Google Scholar 

  37. Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: ICML, vol. 139, pp. 8162–8171. PMLR (2021)

    Google Scholar 

  38. Ning, M., Li, M., Su, J., Salah, A.A., Ertugrul, I.Ö.: Elucidating the exposure bias in diffusion models. ICLR (2024)

    Google Scholar 

  39. Ning, M., Sangineto, E., Porrello, A., Calderara, S., Cucchiara, R.: Input perturbation reduces exposure bias in diffusion models. In: ICML (2023)

    Google Scholar 

  40. Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.: Zero-shot image-to-image translation. In: SIGGRAPH (Conference Paper Track), pp. 11:1–11:11. ACM (2023)

    Google Scholar 

  41. Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. In: ICLR (2023)

    Google Scholar 

  42. Risken, H., Risken, H.: Fokker-planck equation. Springer (1996)

    Google Scholar 

  43. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10674–10685. IEEE (2022)

    Google Scholar 

  44. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR. OpenReview.net (2021)

    Google Scholar 

  45. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: NeurIPS, pp. 11895–11907 (2019)

    Google Scholar 

  46. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR. OpenReview.net (2021)

    Google Scholar 

  47. Tao, S., Wang, J.: Alleviation of gradient exploding in gans: fake can be real. In: CVPR, pp. 1188–1197. Computer Vision Foundation/IEEE (2020)

    Google Scholar 

  48. Yang, Z., et al.: Eliminating lipschitz singularities in diffusion models. ICLR (2024)

    Google Scholar 

  49. Yu, H., Shen, L., Huang, J., Zhou, M., Li, H., Zhao, F.: Debias the training of diffusion models. CoRR abs/2310.08442 (2023)

    Google Scholar 

  50. Zeng, W., et al.: Infusion: preventing customized text-to-image diffusion from overfitting (2024)

    Google Scholar 

  51. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  52. Zhu, Z., et al.: Exploring discrete diffusion models for image captioning. CoRR abs/2211.11694 (2022)

    Google Scholar 

Download references

Acknowledgments

This work was supported in part by NSFC under Grant 61927809 and in part by STCSM under Grant 22DZ2229005.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Jia Wang or Bo Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1776 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, T. et al. (2025). Beta-Tuned Timestep Diffusion Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72646-0_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72645-3

  • Online ISBN: 978-3-031-72646-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics