Energy-Calibrated VAE with Test Time Free Lunch

Luo, Yihong; Qiu, Siya; Tao, Xingjian; Cai, Yujun; Tang, Jing

doi:10.1007/978-3-031-73013-9_19

Yihong Luo^13,14,
Siya Qiu^13,14,
Xingjian Tao¹⁴,
Yujun Cai¹⁵ &
…
Jing Tang^13,14

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15143))

Included in the following conference series:

European Conference on Computer Vision

224 Accesses

Abstract

In this paper, we propose a novel generative model that utilizes a conditional Energy-Based Model (EBM) for enhancing Variational Autoencoder (VAE), termed Energy-Calibrated VAE (EC-VAE). Specifically, VAEs often suffer from blurry generated samples due to the lack of a tailored training on the samples generated in the generative direction. On the other hand, EBMs can generate high-quality samples but require expensive Markov Chain Monte Carlo (MCMC) sampling. To address these issues, we introduce a conditional EBM for calibrating the generative direction of VAE during training, without requiring it for the generation at test time. In particular, we train EC-VAE upon both the input data and the calibrated samples with adaptive weight to enhance efficacy while avoiding MCMC sampling at test time. Furthermore, we extend the calibration idea of EC-VAE to variational learning and normalizing flows, and apply EC-VAE to an additional application of zero-shot image restoration via neural transport prior and range-null theory. We evaluate the proposed method with two applications, including image generation and zero-shot image restoration, and the experimental results show that our method achieves competitive performance over single-step non-adversarial generation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AE-OT-GAN: Training GANs from Data Specific Latent Distribution

Exploiting Deep Generative Prior for Versatile Image Restoration and Manipulation

Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis

References

Aneja, J., Schwing, A., Kautz, J., Vahdat, A.: A contrastive learning approach for training variational autoencoder priors. In: Advances in Neural Information Processing Systems, vol. 34 (2021)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 214–223. PMLR (2017). https://proceedings.mlr.press/v70/arjovsky17a.html
Chamon, L.F., Paternain, S., Calvo-Fullana, M., Ribeiro, A.: Constrained learning with non-convex losses. arXiv:2103.05134 (2021)
Chrabaszcz, P., Loshchilov, I., Hutter, F.: A downsampled variant of imagenet as an alternative to the cifar datasets. arXiv abs/1707.08819 (2017). https://api.semanticscholar.org/CorpusID:7304542
Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: AISTATS, pp. 215–223 (2011)
Google Scholar
Cui, J., Han, T.: Learning energy-based model via dual-MCMC teaching. In: Advances in Neural Information Processing Systems, vol. 36 (2024)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition (2009). https://doi.org/10.1109/cvpr.2009.5206848
Dinh, L., Krueger, D., Bengio, Y.: NICE: non-linear independent components estimation. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Workshop Track Proceedings (2015). http://arxiv.org/abs/1410.8516
Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real NVP. arXiv preprint arXiv:1605.08803 (2016)
Du, Y., Mordatch, I.: Implicit generation and modeling with energy based models. In: Advances in Neural Information Processing Systems, pp. 3608–3618 (2019)
Google Scholar
Gao, R., Nijkamp, E., Kingma, D.P., Xu, Z., Dai, A.M., Wu, Y.N.: Flow contrastive estimation of energy-based models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7518–7528 (2020)
Google Scholar
Gao, R., Song, Y., Poole, B., Wu, Y.N., Kingma, D.P.: Learning energy-based models by diffusion recovery likelihood. arXiv preprint arXiv:2012.08125 (2020)
Gong, X., Chang, S., Jiang, Y., Wang, Z.: AutoGAN: neural architecture search for generative adversarial networks. In: ICCV (2019)
Google Scholar
Goodfellow, I., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)
Google Scholar
Han, T., Nijkamp, E., Zhou, L., Pang, B., Zhu, S.C., Wu, Y.N.: Joint training of variational auto-encoder and latent energy-based model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7978–7987 (2020)
Google Scholar
He, H., Wang, H., Lee, G.H., Tian, Y.: ProbGAN: towards probabilistic GAN with theoretical guarantees. In: ICLR (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Google Scholar
Hill, M., Nijkamp, E., Mitchell, J.C., Pang, B., Zhu, S.C.: Learning probabilistic models from generator latent spaces with hat EBM. In: Oh, A.H., Agarwal, A., Belgrave, D., Cho, K. (eds.) Advances in Neural Information Processing Systems (2022). https://openreview.net/forum?id=AluQNIIb_Zy
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. arXiv preprint arXiv:2006.11239 (2020)
Huang, H., He, R., Sun, Z., Tan, T., et al.: Introvae: introspective variational autoencoders for photographic image synthesis. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015). http://arxiv.org/abs/1502.03167
Jonathan, H., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851 (2020)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: International Conference on Learning Representations (2018)
Google Scholar
Karras, T., Aittala, M., Hellsten, J., Laine, S., Lehtinen, J., Aila, T.: Training generative adversarial networks with limited data. arXiv preprint arXiv:2006.06676 (2020)
Kawar, B., Elad, M., Ermon, S., Song, J.: Denoising diffusion restoration models. In: Advances in Neural Information Processing Systems (2022)
Google Scholar
Kingma, D.P., Dhariwal, P.: Glow: generative flow with invertible $1 \times 1$ convolutions. arXiv preprint arXiv:1807.03039 (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. In: The International Conference on Learning Representations (ICLR) (2014)
Google Scholar
Kingma, D.P., Welling, M.: Stochastic gradient VB and the variational auto-encoder. In: Second International Conference on Learning Representations, ICLR, vol. 19, p. 121 (2014)
Google Scholar
Kong, Z., Chaudhuri, K.: The expressive power of a class of normalizing flow models. In: International Conference on Artificial Intelligence and Statistics (2020)
Google Scholar
Kynkäänniemi, T., Karras, T., Aittala, M., Aila, T., Lehtinen, J.: The role of imagenet classes in fréchet inception distance. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=4oXTQ6m_ws8
Lee, H., Jeong, J., Park, S., Shin, J.: Guiding energy-based models via contrastive latent variables. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=CZmHHj9MgkP
Lin, C.H., Chang, C.C., Chen, Y.S., Juan, D.C., Wei, W., Chen, H.T.: Coco-GAN: generation by parts via conditional coordinating. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4512–4521 (2019)
Google Scholar
Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=PqvMRDCJT9t
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3730–3738 (2015)
Google Scholar
Menon, S., Damian, A., Hu, S., Ravi, N., Rudin, C.: Pulse: self-supervised photo upsampling via latent space exploration of generative models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2437–2445 (2020)
Google Scholar
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Google Scholar
Nielsen, D., Jaini, P., Hoogeboom, E., Winther, O., Welling, M.: Survae flows: surjections to bridge the gap between VAEs and flows. In: NeurIPS (2020)
Google Scholar
van den Oord, A., Kalchbrenner, N., Vinyals, O., Espeholt, L., Graves, A., Kavukcuoglu, K.: Conditional image generation with pixelcnn decoders (2016)
Google Scholar
Parimala, K., Channappayya, S.: Quality aware generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 2948–2958 (2019)
Google Scholar
Parmar, G., Li, D., Lee, K., Tu, Z.: Dual contradistinctive generative autoencoder. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 823–832 (2021)
Google Scholar
Pidhorskyi, S., Adjeroh, D.A., Doretto, G.: Adversarial latent autoencoders. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14104–14113 (2020)
Google Scholar
Song, Y., Dhariwal, P., Chen, M., Sutskever, I.: Consistency models (2023)
Google Scholar
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. In: Advances in Neural Information Processing Systems, pp. 11918–11930 (2019)
Google Scholar
Song, Y., Ermon, S.: Improved techniques for training score-based generative models. arXiv preprint arXiv:2006.09011 (2020)
Tomczak, J.M., Welling, M.: VAE with a vampprior. In: Storkey, A.J., Pérez-Cruz, F. (eds.) International Conference on Artificial Intelligence and Statistics, AISTATS 2018, Playa Blanca, Lanzarote, Canary Islands, Spain, 9–11 April 2018. Proceedings of Machine Learning Research, vol. 84, pp. 1214–1223. PMLR (2018). http://proceedings.mlr.press/v84/tomczak18a.html
Tzeng, E., Hoffman, J., Saenko, K., Darrell, T.: Adversarial discriminative domain adaptation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7167–7176 (2017)
Google Scholar
Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. In: Neural Information Processing Systems (NeurIPS) (2020)
Google Scholar
Wang, W., Sun, Y., Halgamuge, S.: Improving MMD-GAN training with repulsive loss function. In: ICLR (2019)
Google Scholar
Wang, Y., Yu, J., Zhang, J.: Zero-shot image restoration using denoising diffusion null-space model. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=mRieQgMtNTQ
Xiao, Z., Kreis, K., Kautz, J., Vahdat, A.: VAEBM: a symbiosis between variational autoencoders and energy-based models. In: International Conference on Learning Representations (2021)
Google Scholar
Xiao, Z., Yan, Q., Chen, Y., Amit, Y.: Generative latent flow: a framework for non-adversarial image generation. CoRR abs/1905.10485 (2019). http://arxiv.org/abs/1905.10485
Xie, J., Lu, Y., Gao, R., Wu, Y.N.: Cooperative learning of energy-based model and latent variable model via MCMC teaching. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 32 (2018)
Google Scholar
Xie, J., Lu, Y., Zhu, S.C., Wu, Y.: A theory of generative convnet. In: International Conference on Machine Learning, pp. 2635–2644 (2016)
Google Scholar
Xie, J., Zheng, Z., Li, P.: Learning energy-based model with variational auto-encoder as amortized sampler. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 10441–10451 (2021)
Google Scholar
Xie, J., Zhu, Y., Li, J., Li, P.: A tale of two flows: Cooperative learning of langevin flow and normalizing flow toward energy-based model. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=31d5RLCUuXC
Yang, G., Huang, X., Hao, Z., Liu, M.Y., Belongie, S., Hariharan, B.: Pointflow: 3D point cloud generation with continuous normalizing flows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 4541–4550 (2019)
Google Scholar
Yu, F., Seff, A., Zhang, Y., Song, S., Funkhouser, T., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 (2015)
Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)
Google Scholar
Zhao, Y., Xie, J., Li, P.: Learning energy-based generative models via coarse-to-fine expanding and sampling. In: International Conference on Learning Representations (2021). https://openreview.net/forum?id=aD1_5zowqV

Download references

Acknowledgments

Jing Tang’s work is partially supported by the National Natural Science Foundation of China (NSFC) under Grant No. U22B2060, by National Key R&D Program of China under Grant No. 2023YFF0725100, by National Language Commission under Grant No. WT145-39, by The Department of Science and Technology of Guangdong Province under Grant No. 2023A1515110131, by Guangzhou Municipal Science and Technology Bureau under Grant No. 2023A03J0667 and 2024A04J4454, by Hong Kong Productivity Council (HKPC), and by Createlink Technology Co., Ltd.

Author information

Authors and Affiliations

The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Yihong Luo, Siya Qiu & Jing Tang
The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Yihong Luo, Siya Qiu, Xingjian Tao & Jing Tang
Meta, Menlo Park, USA
Yujun Cai

Authors

Yihong Luo
View author publications
You can also search for this author in PubMed Google Scholar
Siya Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xingjian Tao
View author publications
You can also search for this author in PubMed Google Scholar
Yujun Cai
View author publications
You can also search for this author in PubMed Google Scholar
Jing Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jing Tang .

Editor information

Editors and Affiliations

University of Birmingham, Birmingham, UK
Aleš Leonardis
University of Trento, Trento, Italy
Elisa Ricci
Technical University of Darmstadt, Darmstadt, Germany
Stefan Roth
Princeton University, Princeton, NJ, USA
Olga Russakovsky
Czech Technical University in Prague, Prague, Czech Republic
Torsten Sattler
École des Ponts ParisTech, Marne-la-Vallée, France
Gül Varol

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 26484 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, Y., Qiu, S., Tao, X., Cai, Y., Tang, J. (2025). Energy-Calibrated VAE with Test Time Free Lunch. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15143. Springer, Cham. https://doi.org/10.1007/978-3-031-73013-9_19

Download citation

DOI: https://doi.org/10.1007/978-3-031-73013-9_19
Published: 27 November 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73012-2
Online ISBN: 978-3-031-73013-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Energy-Calibrated VAE with Test Time Free Lunch