Abstract
Diffusion models have received a lot of attention in the field of generation due to their ability to produce high-quality samples. However, several recent studies indicate that treating all distributions equally in diffusion model training is sub-optimal. In this paper, we conduct an in-depth theoretical analysis of the forward process of diffusion models. Our findings reveal that the distribution variations are non-uniform throughout the diffusion process and the most drastic variations in distribution occur in the initial stages. Consequently, simple uniform timestep sampling strategy fail to align with these properties, potentially leading to sub-optimal training of diffusion models. To address this, we propose the Beta-Tuned Timestep Diffusion Model (B-TTDM), which devises a timestep sampling strategy based on the beta distribution. By choosing the correct parameters, B-TTDM aligns the timestep sampling distribution with the properties of the forward diffusion process. Extensive experiments on different benchmark datasets validate the effectiveness of B-TTDM.
This work was done during Tianyi Zheng’s internship at vivo.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Anderson, B.D.: Reverse-time diffusion equation models. Stochastic Processes Appl. 12(3), 313–326 (1982)
Bao, F., et al.: All are worth words: a vit backbone for diffusion models. In: CVPR, pp. 22669–22679. IEEE (2023)
Block, A., Mroueh, Y., Rakhlin, A., Ross, J.: Fast mixing of multi-scale langevin dynamics under the manifold hypothesis. CoRR abs/2006.11166 (2020)
Cai, R., et al.: Benchlmm: benchmarking cross-style visual capability of large multimodal models. In: ECCV (2024)
Chen, Z., Li, B., Wu, S., Jiang, K., Ding, S., Zhang, W.: Content-based unrestricted adversarial attack. NeurIPS 36 (2024)
Chen, Z., Li, B., Xu, J., Wu, S., Ding, S., Zhang, W.: Towards practical certifiable patch defense with vision transformer. In: CVPR, pp. 15148–15158 (2022)
Choi, J., Lee, J., Shin, C., Kim, S., Kim, H., Yoon, S.: Perception prioritized training of diffusion models. In: CVPR, pp. 11462–11471. IEEE (2022)
Choi, Y., Uh, Y., Yoo, J., Ha, J.: Stargan v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194. Computer Vision Foundation/IEEE (2020)
Dhariwal, P., Nichol, A.Q.: Diffusion models beat gans on image synthesis. In: NeurIPS. pp. 8780–8794 (2021)
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. In: ICLR. OpenReview.net (2017)
Fan, K., et al.: Freemotion: a unified framework for number-free text-to-motion synthesis. ECCV (2024)
Fang, B., Li, B., Wu, S., Yi, R., Ding, S., Ma, L.: Re-thinking data availablity attacks against deep neural networks. CVPR (2024)
Fang, B., et al.: Towards generalizable data protection with transferable unlearnable examples. CoRR abs/2305.11191 (2023)
Ge, X., et al.: Difffas: face anti-spoofing via generative diffusion models. In: ECCV (2024)
Geng, C., et al.: Improving adversarial energy-based model via diffusion process. ICML (2024)
Geng, C., Wang, J., Gao, Z., Frellsen, J., Hauberg, S.: Bounds all around: training energy-based models with bidirectional bounds. NeurIPS (2021)
Go, H., Lee, Y., Lee, S., Oh, S., Moon, H., Choi, S.: Addressing negative transfer in diffusion models. NeurIPS 36 (2024)
Goodfellow, I.J., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)
Hang, T., et al.: Efficient diffusion training via min-snr weighting strategy. In: ICCV, pp. 7407–7417. IEEE (2023)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: NeurIPS, pp. 6626–6637 (2017)
Ho, J., et al.: Imagen video: high definition video generation with diffusion models. CoRR abs/2210.02303 (2022)
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)
Hyvärinen, A.: Estimation of non-normalized statistical models by score matching. J. Mach. Learn. Res. 6, 695–709 (2005)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. CoRR abs/1710.10196 (2017)
Karras, T., Aittala, M., Aila, T., Laine, S.: Elucidating the design space of diffusion-based generative models. In: NeurIPS (2022)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. PAMI 43(12), 4217–4228 (2021)
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: NeurIPS, pp. 10236–10245 (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: ICLR (2014)
Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. In: NeurIPS, pp. 3929–3938 (2019)
Lee, H., Lu, J., Tan, Y.: Convergence for score-based generative modeling with polynomial complexity. In: NeurIPS (2022)
Li, X., Thickstun, J., Gulrajani, I., Liang, P., Hashimoto, T.B.: Diffusion-lm improves controllable text generation. In: NeurIPS (2022)
Liu, Y., Chen, Y., Dai, W., Gou, M., Huang, C.T., Xiong, H.: Source-free domain adaptation with domain generalized pretraining for face anti-spoofing. PAMI (2024)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV, pp. 3730–3738. IEEE Computer Society (2015)
Lu, C., Zhou, Y., Bao, F., Chen, J., Li, C., Zhu, J.: Dpm-solver: a fast ODE solver for diffusion probabilistic model sampling in around 10 steps. In: NeurIPS (2022)
Meng, C., et al.: Sdedit: guided image synthesis and editing with stochastic differential equations. In: ICLR (2022)
Nash, C., Menick, J., Dieleman, S., Battaglia, P.W.: Generating images with sparse representations. In: ICML, vol. 139, pp. 7958–7968. PMLR (2021)
Nichol, A.Q., Dhariwal, P.: Improved denoising diffusion probabilistic models. In: ICML, vol. 139, pp. 8162–8171. PMLR (2021)
Ning, M., Li, M., Su, J., Salah, A.A., Ertugrul, I.Ö.: Elucidating the exposure bias in diffusion models. ICLR (2024)
Ning, M., Sangineto, E., Porrello, A., Calderara, S., Cucchiara, R.: Input perturbation reduces exposure bias in diffusion models. In: ICML (2023)
Parmar, G., Singh, K.K., Zhang, R., Li, Y., Lu, J., Zhu, J.: Zero-shot image-to-image translation. In: SIGGRAPH (Conference Paper Track), pp. 11:1–11:11. ACM (2023)
Poole, B., Jain, A., Barron, J.T., Mildenhall, B.: Dreamfusion: text-to-3d using 2d diffusion. In: ICLR (2023)
Risken, H., Risken, H.: Fokker-planck equation. Springer (1996)
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR, pp. 10674–10685. IEEE (2022)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR. OpenReview.net (2021)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: NeurIPS, pp. 11895–11907 (2019)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: ICLR. OpenReview.net (2021)
Tao, S., Wang, J.: Alleviation of gradient exploding in gans: fake can be real. In: CVPR, pp. 1188–1197. Computer Vision Foundation/IEEE (2020)
Yang, Z., et al.: Eliminating lipschitz singularities in diffusion models. ICLR (2024)
Yu, H., Shen, L., Huang, J., Zhou, M., Li, H., Zhao, F.: Debias the training of diffusion models. CoRR abs/2310.08442 (2023)
Zeng, W., et al.: Infusion: preventing customized text-to-image diffusion from overfitting (2024)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Zhu, Z., et al.: Exploring discrete diffusion models for image captioning. CoRR abs/2211.11694 (2022)
Acknowledgments
This work was supported in part by NSFC under Grant 61927809 and in part by STCSM under Grant 22DZ2229005.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Zheng, T. et al. (2025). Beta-Tuned Timestep Diffusion Model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15061. Springer, Cham. https://doi.org/10.1007/978-3-031-72646-0_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-72646-0_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72645-3
Online ISBN: 978-3-031-72646-0
eBook Packages: Computer ScienceComputer Science (R0)