Controllable image generation based on causal representation learning

Huang, Shanshan; Wang, Yuanhao; Gong, Zhili; Liao, Jun; Wang, Shu; Liu, Li

doi:10.1631/FITEE.2300303

Controllable image generation based on causal representation learning

基于因果表征学习的可控图像生成

Published: 08 February 2024

Volume 25, pages 135–148, (2024)
Cite this article

Frontiers of Information Technology & Electronic Engineering Aims and scope Submit manuscript

Shanshan Huang (黄珊珊) ORCID: orcid.org/0000-0001-7893-3861¹,
Yuanhao Wang (王元浩)¹,
Zhili Gong (龚志黎)¹,
Jun Liao (廖军)¹,
Shu Wang (王姝)² &
…
Li Liu (刘礼) ORCID: orcid.org/0000-0002-4776-5292¹

494 Accesses
3 Citations
Explore all metrics

Abstract

Artificial intelligence generated content (AIGC) has emerged as an indispensable tool for producing large-scale content in various forms, such as images, thanks to the significant role that AI plays in imitation and production. However, interpretability and controllability remain challenges. Existing AI methods often face challenges in producing images that are both flexible and controllable while considering causal relationships within the images. To address this issue, we have developed a novel method for causal controllable image generation (CCIG) that combines causal representation learning with bi-directional generative adversarial networks (GANs). This approach enables humans to control image attributes while considering the rationality and interpretability of the generated images and also allows for the generation of counterfactual images. The key of our approach, CCIG, lies in the use of a causal structure learning module to learn the causal relationships between image attributes and joint optimization with the encoder, generator, and joint discriminator in the image generation module. By doing so, we can learn causal representations in image’s latent space and use causal intervention operations to control image generation. We conduct extensive experiments on a real-world dataset, CelebA. The experimental results illustrate the effectiveness of CCIG.

摘要

人工智能生成内容(AIGC)已成为制作各种形式的大规模内容不可或缺的工具,特别是在图像生成和编辑中发挥重要作用。然而,图像生成和编辑的可解释性和可控性仍然是一个挑战。现有人工智能方法由于忽略图像内部的因果关系,往往难以生成既灵活又可控的图像。为解决这个问题,本文开发了一种新颖的因果可控图像生成方法,它将因果表征学习与双向生成对抗网络相结合。本文方法的关键在于使用因果结构学习模块学习图像属性之间的因果关系,并与图像生成模块中的编码器、生成器和联合鉴别器进行联合优化。基于这种方法,不仅可以学习图像潜在空间中的因果表征,进而实现因果可控的图像编辑,还可以利用因果干预操作生成反事实图像。最后,在真实世界的数据集CelebA上进行大量实验。实验结果证明所提方法的合理性和有效性。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Ahuja K, Mahajan D, Wang YX, et al., 2023. Interventional causal representation learning. Proc 43^th Int Conf on Machine Learning, p.372–407.
Augustin M, Boreiko V, Croce F, et al., 2022. Diffusion visual counterfactual explanations. Proc 36^th Advances in Neural Information Processing Systems, p.364–377.
Brehmer J, de Haan P, Lippe P, et al., 2022. Weakly supervised causal representation learning. Proc 36^th Advances in Neural Information Processing Systems, p.38319–38331.
Gao YH, Shen L, Xia ST, 2021. DAG-GAN: causal structure learning with generative adversarial nets. Proc IEEE Int Conf on Acoustics, Speech and Signal Processing, p.3320–3324. https://doi.org/10.1109/ICASSP39728.2021.9414770
He KM, Zhang XY, Ren SQ, et al., 2016. Deep residual learning for image recognition. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.770–778. https://doi.org/10.1109/CVPR.2016.90
Heusel M, Ramsauer H, Unterthiner T, et al., 2017. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. Proc 31^st Int Conf on Neural Information Processing Systems, p.6629–6640.
Huang S, Li Q, Liao J, et al., 2023. An overview of controllable image synthesis: current challenges and future trends. SSRN, Article 4187269. https://ssrn.com/abstract=4187269
Huang SS, Jin X, Jiang Q, et al., 2022. Deep learning for image colorization: current and future prospects. Eng Appl Artif Intell, 114:105006. https://doi.org/10.1016/j.engappai.2022.105006
Article Google Scholar
Kocaoglu M, Snyder C, Dimakis AG, et al., 2018. Causal-GAN: learning causal implicit generative models with adversarial training. Proc Int Conf on Learning Representations.
Lachapelle S, Brouillard P, Deleu T, et al., 2020. Gradient-based neural DAG learning. Proc 8^th Int Conf on Learning Representations.
Lai PK, 2022. DeepSCM: an efficient convolutional neural network surrogate model for the screening of therapeutic antibody viscosity. Comput Struct Biotechnol J, 20:2143–2152. https://doi.org/10.1016/j.csbj.2022.04.035
Article Google Scholar
Leeb F, Annadani Y, Bauer S, et al., 2020. Structural autoencoders improve representations for generation and transfer. https://arxiv.org/abs/2006.07796v1
Lippe P, Magliacane S, Löwe S, et al., 2022. CITRIS: causal identifiability from temporal intervened sequences. Proc 39^th Int Conf on Machine Learning, p.13557–13603.
Liu ZW, Luo P, Wang XG, et al., 2015. Deep learning face attributes in the wild. Proc IEEE Int Conf on Computer Vision, p.3730–3738. https://doi.org/10.1109/ICCV.2015.425
Lopez-Paz D, Nishihara R, Chintala S, et al., 2017. Discovering causal signals in images. Proc IEEE Conf on Computer Vision and Pattern Recognition, p.6979–6987. https://doi.org/10.1109/CVPR.2017.14
Lu CC, Wu YH, Hernández-Lobato JM, et al., 2021. Nonlinear invariant risk minimization: a causal approach. https://arxiv.org/abs/2102.12353
Lv FR, Liang J, Li S, et al., 2022. Causality inspired representation learning for domain generalization. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.8046–8056. https://doi.org/10.1109/CVPR52688.2022.00788
Moraffah R, Moraffah B, Karami M, et al., 2020. Causal adversarial network for learning conditional and inter-ventional distributions. https://arxiv.org/abs/2008.11376
Ng I, Zhu SY, Chen ZT, et al., 2019. A graph autoencoder approach to causal structure learning. https://arxiv.org/abs/1911.07420
Ng I, Zhu S, Fang Z, et al., 2022. Masked gradient-based causal structure learning. Proc SIAM Int Conf on Data Mining, p.424–432. https://doi.org/10.1137/1.9781611977172.48
Pan YH, Li ZC, Zhang LY, et al., 2022. Causal inference with knowledge distilling and curriculum learning for unbiased VQA. ACM Trans Multim Comput Commun Appl, 18(3):67. https://doi.org/10.1145/3487042
Article Google Scholar
Petkov H, Hanley C, Dong F, 2022. DAG-WGAN: causal structure learning with Wasserstein generative adversarial networks. https://arxiv.org/abs/2204.00387
Reinhold JC, Carass A, Prince JL, 2021. A structural causal model for MR images of multiple sclerosis. Proc 24^th Int Conf on Medical Image Computing and Computer-Assisted Intervention, p.782–792. https://doi.org/10.1007/978-3-030-87240-3_75
Salimans T, Goodfellow I, Zaremba W, et al., 2016. Improved techniques for training GANs. Proc 30^th Int Conf on Neural Information Processing Systems, p.2234–2242.
Sanchez P, Tsaftaris SA, 2022. Diffusion causal models for counterfactual estimation. Proc 1^st Conf on Causal Learning and Reasoning, p.647–668.
Sanchez P, Kascenas A, Liu X, et al., 2022. What is healthy? Generative counterfactual diffusion for lesion localization. Proc 2^nd MICCAI Workshop on Deep Generative Models, p.34–44. https://doi.org/10.1007/978-3-031-18576-2_4
Sauer A, Geiger A, 2021. Counterfactual generative networks. Proc 9^th Int Conf on Learning Representations.
Schölkopf B, Locatello F, Bauer S, et al., 2021. Toward causal representation learning. Proc IEEE, 109(5):612–634. https://doi.org/10.1109/JPROC.2021.3058954
Article Google Scholar
Shen XW, Liu FR, Dong HZ, et al., 2022. Weakly supervised disentangled generative causal representation learning. J Mach Learn Res, 23(1):241.
MathSciNet Google Scholar
Shen YJ, Zhou BL, 2021. Closed-form factorization of latent semantics in GANs. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.1532–1540. https://doi.org/10.1109/CVPR46437.2021.00158
Shen YJ, Yang CY, Tang XO, et al., 2022. InterFace-GAN: interpreting the disentangled face representation learned by GANs. IEEE Trans Patt Anal Mach Intell, 44(4):2004–2018. https://doi.org/10.1109/TPAMI.2020.3034267
Article Google Scholar
Sun YP, Chen Q, He XY, et al., 2022. Singular value finetuning: few-shot segmentation requires few-parameters fine-tuning. Proc 36^th Advances in Neural Information Processing Systems, p.37484–37496.
Suter R, Miladinovic D, Schölkopf B, et al., 2019. Robustly disentangled causal mechanisms: validating deep representations for interventional robustness. Proc 36^th Int Conf on Machine Learning, p.6056–6065.
Varando G, 2020. Learning DAGs without imposing acyclicity. https://arxiv.org/abs/2006.03005v1
Vowels MJ, Camgoz NC, Bowden R, 2023. D’ya like DAGs? A survey on structure learning and causal discovery. ACM Comput Surv, 55(4):82. https://doi.org/10.1145/3527154
Article Google Scholar
Wang WJ, Lin XY, Feng FL, et al., 2022. Causal representation learning for out-of-distribution recommendation. Proc ACM Web Conf, p.3562–3571. https://doi.org/10.1145/3485447.3512251
Wang XQ, Du YL, Zhu SY, et al., 2021. Ordering-based causal discovery with reinforcement learning. Proc 30^th Int Joint Conf on Artificial Intelligence, p.3566–3573.
Wang YF, Zhu YL, Hang TT, et al., 2021. Incorporating proportional sparse penalty for causal structure learning. Proc IEEE 33^rd Int Conf on Tools with Artificial Intelligence, p.105–112. https://doi.org/10.1109/ICTAI52525.2021.00023
Wei D, Gao T, Yu Y, 2020. DAGs with no fears: a closer look at continuous optimization for learning Bayesian networks. Proc 34^th Int Conf on Neural Information Processing Systems, p.328.
Xia WH, Zhang YL, Yang YJ, et al., 2023. GAN inversion: a survey. IEEE Trans Patt Anal Mach Intell, 45(3):3121–3138. https://doi.org/10.1109/TPAMI.2022.3181070
Google Scholar
Yang MY, Liu FR, Chen ZT, et al., 2021. CausalVAE: disentangled representation learning via neural structural causal models. Proc IEEE/CVF Conf on Computer Vision and Pattern Recognition, p.9593–9602. https://doi.org/10.1109/CVPR46437.2021.00947
Yu Y, Chen J, Gao T, et al., 2019. DAG-GNN: DAG structure learning with graph neural networks. Proc 36^th Int Conf on Machine Learning, p.7154–7163.
Zhang LM, Rao A, Agrawala M, 2023. Adding conditional control to text-to-image diffusion models. https://arxiv.org/abs/2302.05543
Zhang WB, Liao J, Zhang Y, et al., 2022. CMGAN: a generative adversarial network embedded with causal matrix. Appl Intell, 52(14):16233–16245. https://doi.org/10.1007/S10489-021-03094-8
Article Google Scholar
Zhang XH, Wong Y, Wu XF, et al., 2021. Learning causal representation for training cross-domain pose estimator via generative interventions. Proc IEEE/CVF Int Conf on Computer Vision, p.11270–11280. https://doi.org/10.1109/ICCV48922.2021.01108
Zheng X, Aragam B, Ravikumar P, et al., 2018. DAGs with NO TEARS: continuous optimization for structure learning. Proc 32^nd Int Conf on Neural Information Processing Systems, p.9492–9503.
Zhu JG, Xie HC, AbdAlmageed W, 2022. Do-operation guided causal representation learning with reduced supervision strength. https://arxiv.org/abs/2206.01802v1
Zhu SY, Ng I, Chen ZT, 2020. Causal discovery with reinforcement learning. Proc 8^th Int Conf on Learning Representations.

Download references

Author information

Authors and Affiliations

School of Big Data and Software Engineering, Chongqing University, Chongqing, 401331, China
Shanshan Huang (黄珊珊), Yuanhao Wang (王元浩), Zhili Gong (龚志黎), Jun Liao (廖军) & Li Liu (刘礼)
School of Materials and Energy, Southwest University, Chongqing, 400715, China
Shu Wang (王姝)

Authors

Shanshan Huang (黄珊珊)
View author publications
You can also search for this author inPubMed Google Scholar
Yuanhao Wang (王元浩)
View author publications
You can also search for this author inPubMed Google Scholar
Zhili Gong (龚志黎)
View author publications
You can also search for this author inPubMed Google Scholar
Jun Liao (廖军)
View author publications
You can also search for this author inPubMed Google Scholar
Shu Wang (王姝)
View author publications
You can also search for this author inPubMed Google Scholar
Li Liu (刘礼)
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Shanshan HUANG designed the research. Shanshan HUANG, Yuanhao WANG, and Zhili GONG processed the data. Shanshan HUANG drafted the paper. Yuanhao WANG, Jun LIAO, and Shu WANG helped organize the paper. Shanshan HUANG and Li LIU revised and finalized the paper.

Corresponding author

Correspondence to Li Liu (刘礼).

Ethics declarations

All the authors declare that they have no conflict of interest.

Additional information

Project supported by the National Major Science and Technology Projects of China (No. 2022YFB3303302), the National Natural Science Foundation of China (Nos. 61977012 and 62207007), and the Central Universities Project in China at Chongqing University (Nos. 2021CDJYGRH011 and 2020CDJSK06PT14)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, S., Wang, Y., Gong, Z. et al. Controllable image generation based on causal representation learning. Front Inform Technol Electron Eng 25, 135–148 (2024). https://doi.org/10.1631/FITEE.2300303

Download citation

Received: 05 May 2023
Accepted: 13 October 2023
Published: 08 February 2024
Issue Date: January 2024
DOI: https://doi.org/10.1631/FITEE.2300303

Key words

关键词

CLC number

TP391.41

Part of a collection:

FITEE Special Issue on Recent Advances in Artificial Intelligence Generated Content (AIGC)

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Controllable image generation based on causal representation learning

Abstract

摘要

Access this article

Subscribe and save

Buy Now

Explore related subjects

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Key words

关键词

CLC number

Subscribe and save

Buy Now