Skip to main content

Energy-Calibrated VAE with Test Time Free Lunch

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15143))

Included in the following conference series:

  • 224 Accesses

Abstract

In this paper, we propose a novel generative model that utilizes a conditional Energy-Based Model (EBM) for enhancing Variational Autoencoder (VAE), termed Energy-Calibrated VAE (EC-VAE). Specifically, VAEs often suffer from blurry generated samples due to the lack of a tailored training on the samples generated in the generative direction. On the other hand, EBMs can generate high-quality samples but require expensive Markov Chain Monte Carlo (MCMC) sampling. To address these issues, we introduce a conditional EBM for calibrating the generative direction of VAE during training, without requiring it for the generation at test time. In particular, we train EC-VAE upon both the input data and the calibrated samples with adaptive weight to enhance efficacy while avoiding MCMC sampling at test time. Furthermore, we extend the calibration idea of EC-VAE to variational learning and normalizing flows, and apply EC-VAE to an additional application of zero-shot image restoration via neural transport prior and range-null theory. We evaluate the proposed method with two applications, including image generation and zero-shot image restoration, and the experimental results show that our method achieves competitive performance over single-step non-adversarial generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Aneja, J., Schwing, A., Kautz, J., Vahdat, A.: A contrastive learning approach for training variational autoencoder priors. In: Advances in Neural Information Processing Systems, vol. 34 (2021)

    Google Scholar 

  2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR (2017). https://proceedings.mlr.press/v70/arjovsky17a.html

  3. Chamon, L.F., Paternain, S., Calvo-Fullana, M., Ribeiro, A.: Constrained learning with non-convex losses. arXiv:2103.05134 (2021)

  4. Chrabaszcz, P., Loshchilov, I., Hutter, F.: A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv abs/1707.08819 (2017). https://api.semanticscholar.org/CorpusID:7304542

  5. Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, pp. 215–223 (2011)

    Google Scholar 

  6. Cui, J., Han, T.: Learning energy-based model via dual-MCMC teaching. In: Advances in Neural Information Processing Systems, vol. 36 (2024)

    Google Scholar 

  7. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848

  8. Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Workshop Track Proceedings (2015). http://arxiv.org/abs/1410.8516

  9. Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. arXiv preprint arXiv:1605.08803 (2016)

  10. Du, Y., Mordatch, I.: Implicit generation and modeling with energy based models. In: Advances in Neural Information Processing Systems, pp. 3608–3618 (2019)

    Google Scholar 

  11. Gao, R., Nijkamp, E., Kingma, D.P., Xu, Z., Dai, A.M., Wu, Y.N.: Flow contrastive estimation of energy-based models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7518–7528 (2020)

    Google Scholar 

  12. Gao, R., Song, Y., Poole, B., Wu, Y.N., Kingma, D.P.: Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125 (2020)

  13. Gong, X., Chang, S., Jiang, Y., Wang, Z.: AutoGAN: neural architecture search for generative adversarial networks. In: ICCV (2019)

    Google Scholar 

  14. Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  15. Han, T., Nijkamp, E., Zhou, L., Pang, B., Zhu, S.C., Wu, Y.N.: Joint training of variational auto-encoder and latent energy-based model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7978–7987 (2020)

    Google Scholar 

  16. He, H., Wang, H., Lee, G.H., Tian, Y.: ProbGAN: towards probabilistic GAN with theoretical guarantees. In: ICLR (2019)

    Google Scholar 

  17. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)

    Google Scholar 

  18. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)

    Google Scholar 

  19. Hill, M., Nijkamp, E., Mitchell, J.C., Pang, B., Zhu, S.C.: Learning probabilistic models from generator latent spaces with hat EBM. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022). https://openreview.net/forum?id=AluQNIIb_Zy

  20. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239 (2020)

  21. Huang, H., He, R., Sun, Z., Tan, T., et al.: Introvae: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  22. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015). http://arxiv.org/abs/1502.03167

  23. Jonathan, H., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851 (2020)

    Google Scholar 

  24. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)

    Google Scholar 

  25. Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676 (2020)

  26. Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advances in Neural Information Processing Systems (2022)

    Google Scholar 

  27. Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible \(1 \times 1\) convolutions. arXiv preprint arXiv:1807.03039 (2018)

  28. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: The International Conference on Learning Representations (ICLR) (2014)

    Google Scholar 

  29. Kingma, D.P., Welling, M.: Stochastic gradient VB and the variational auto-encoder. In: Second International Conference on Learning Representations, ICLR, vol. 19, p. 121 (2014)

    Google Scholar 

  30. Kong, Z., Chaudhuri, K.: The expressive power of a class of normalizing flow models. In: International Conference on Artificial Intelligence and Statistics (2020)

    Google Scholar 

  31. Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=4oXTQ6m_ws8

  32. Lee, H., Jeong, J., Park, S., Shin, J.: Guiding energy-based models via contrastive latent variables. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=CZmHHj9MgkP

  33. Lin, C.H., Chang, C.C., Chen, Y.S., Juan, D.C., Wei, W., Chen, H.T.: Coco-GAN: generation by parts via conditional coordinating. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4512–4521 (2019)

    Google Scholar 

  34. Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=PqvMRDCJT9t

  35. Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)

    Google Scholar 

  36. Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2445 (2020)

    Google Scholar 

  37. Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)

    Google Scholar 

  38. Nielsen, D., Jaini, P., Hoogeboom, E., Winther, O., Welling, M.: Survae flows: surjections to bridge the gap between VAEs and flows. In: NeurIPS (2020)

    Google Scholar 

  39. van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Conditional image generation with pixelcnn decoders (2016)

    Google Scholar 

  40. Parimala, K., Channappayya, S.: Quality aware generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 2948–2958 (2019)

    Google Scholar 

  41. Parmar, G., Li, D., Lee, K., Tu, Z.: Dual contradistinctive generative autoencoder. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 823–832 (2021)

    Google Scholar 

  42. Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)

    Google Scholar 

  43. Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models (2023)

    Google Scholar 

  44. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems, pp. 11918–11930 (2019)

    Google Scholar 

  45. Song, Y., Ermon, S.: Improved techniques for training score-based generative models. arXiv preprint arXiv:2006.09011 (2020)

  46. Tomczak, J.M., Welling, M.: VAE with a vampprior. In: Storkey, A.J., Pérez-Cruz, F. (eds.) International Conference on Artificial Intelligence and Statistics, AISTATS 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, 9–11 April 2018. Proceedings of Machine Learning Research, vol. 84, pp. 1214–1223. PMLR (2018). http://proceedings.mlr.press/v84/tomczak18a.html

  47. Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)

    Google Scholar 

  48. Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: Neural Information Processing Systems (NeurIPS) (2020)

    Google Scholar 

  49. Wang, W., Sun, Y., Halgamuge, S.: Improving MMD-GAN training with repulsive loss function. In: ICLR (2019)

    Google Scholar 

  50. Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=mRieQgMtNTQ

  51. Xiao, Z., Kreis, K., Kautz, J., Vahdat, A.: VAEBM: a symbiosis between variational autoencoders and energy-based models. In: International Conference on Learning Representations (2021)

    Google Scholar 

  52. Xiao, Z., Yan, Q., Chen, Y., Amit, Y.: Generative latent flow: a framework for non-adversarial image generation. CoRR abs/1905.10485 (2019). http://arxiv.org/abs/1905.10485

  53. Xie, J., Lu, Y., Gao, R., Wu, Y.N.: Cooperative learning of energy-based model and latent variable model via MCMC teaching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)

    Google Scholar 

  54. Xie, J., Lu, Y., Zhu, S.C., Wu, Y.: A theory of generative convnet. In: International Conference on Machine Learning, pp. 2635–2644 (2016)

    Google Scholar 

  55. Xie, J., Zheng, Z., Li, P.: Learning energy-based model with variational auto-encoder as amortized sampler. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10441–10451 (2021)

    Google Scholar 

  56. Xie, J., Zhu, Y., Li, J., Li, P.: A tale of two flows: Cooperative learning of langevin flow and normalizing flow toward energy-based model. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=31d5RLCUuXC

  57. Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: Pointflow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)

    Google Scholar 

  58. Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)

  59. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  60. Zhao, Y., Xie, J., Li, P.: Learning energy-based generative models via coarse-to-fine expanding and sampling. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=aD1_5zowqV

Download references

Acknowledgments

Jing Tang’s work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. U22B2060, by National Key R&D Program of China under Grant No. 2023YFF0725100, by National Language Commission under Grant No. WT145-39, by The Department of Science and Technology of Guangdong Province under Grant No. 2023A1515110131, by Guangzhou Municipal Science and Technology Bureau under Grant No. 2023A03J0667 and 2024A04J4454, by Hong Kong Productivity Council (HKPC), and by Createlink Technology Co., Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Tang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 26484 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Luo, Y., Qiu, S., Tao, X., Cai, Y., Tang, J. (2025). Energy-Calibrated VAE with Test Time Free Lunch. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15143. Springer, Cham. https://doi.org/10.1007/978-3-031-73013-9_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-73013-9_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-73012-2

  • Online ISBN: 978-3-031-73013-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics