Dual-path hypernetworks of style and text for one-shot domain adaptation

Li, Siqi; Pu, Yuanyuan; Zhao, Zhengpeng; Yang, Qiuxia; Gu, Jinjing; Li, Yupan; Xu, Dan

doi:10.1007/s10489-023-05229-5

Dual-path hypernetworks of style and text for one-shot domain adaptation

Published: 06 February 2024

Volume 54, pages 2614–2630, (2024)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Siqi Li¹,
Yuanyuan Pu^1,2,
Zhengpeng Zhao¹,
Qiuxia Yang¹,
Jinjing Gu¹,
Yupan Li¹ &
…
Dan Xu¹

877 Accesses
1 Citation
Explore all metrics

Abstract

Learning a one-shot domain adaptation model is an exciting and challenging topic in computer vision and graphics. A feasible solution is to fine-tune a pre-trained generator to the target domain by leveraging the powerful semantic capabilities of CLIP (Contrastive Language-Image Pretraining). Unfortunately, when the target image shows a significant difference from the source domain, existing methods would result in overfitting, and generated images do not correctly reflect the texture of the target image. To address this issue, we propose a Dynamic Domain Transfer Strategy (DDTS) to align the texture information between the source and target domain by dynamically adjusting the direction of domain transfer. Furthermore, the delicately designed dual-path hypernetworks of style and text (Dual-HyperST) for one-shot domain adaptation characterize the target domain’s textual style and visual style with a text-guide path and a style-guide path. Specifically, the style-guided path predicts a set of style weight offsets by the target image, followed by the text-guided path predicts a set of text weight offsets by a text prompt. To better integrate the information between these two paths, we introduce a hypernetwork that learns to modulate the pre-trained generator instead of fine-tuning. Qualitative and quantitative experiments demonstrate the superiority of Dual-HyperST, which surpasses the state-of-the-art methods in the diversity and high quality of the generated images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

A New StyleGAN Latent Space Based Model for Image Style Transfer

StyleAdapter: A Unified Stylized Image Generation Model

Article 25 October 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27
Zhou D, Zhang H, Li Q, Ma J, Xu X (2022) Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. IEEE Trans Multimed
Tang H, Liu H, Xu D, Torr PH, Sebe N (2021) Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans Neural Netw Learn Syst
Li X, Zhang S, Hu J, Cao L, Hong X, Mao X, Huang F, Wu Y, Ji R (2021) Image-to-image translation via hierarchical style disentanglement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8639–8648
Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of style GAN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119
Yadav NK, Singh SK, Dubey SR (2022) Csa-gan: cyclic synthesized attention guided generative adversarial network for face synthesis. Appl Intell 52(11):12704–12723
Article Google Scholar
Zhang L, Long C, Yan Q, Zhang X, Xiao C (2020) Cla-GAN: a context and lightness aware generative adversarial network for shadow removal. In: Computer graphics forum vol 39. Wiley Online Library, pp 483–494
Chen G, Zhang G, Yang Z, Liu W (2022) Multi-scale patch-gan with edge detection for image inpainting. Appl Intell 1–16
Wang Y, Gonzalez-Garcia A, Berga D, Herranz L, Khan FS, Weijer Jvd (2020) Mine GAN: effective knowledge transfer from GANs to target domains with few images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9332–9341
Mo S, Cho M, Shin J (2020) Freeze the discriminator: a simple baseline for fine-tuning gans. In: CVPR AI for content creation workshop
Karras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T (2020) Training generative adversarial networks with limited data. Adv Neural Inf Process Syst 33:12104–12114
Google Scholar
Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for GAN training. IEEE Trans Image Process 30:1882–1897
Article ADS MathSciNet PubMed Google Scholar
Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763
Gal R, Patashnik O, Maron H, Bermano AH, Chechik G, Cohen-Or D (2022) Stylegan-nada: clip-guided domain adaptation of image generators. ACM Trans Graphics (TOG) 41(4):1–13
Article Google Scholar
Kim G, Kwon T, Ye JC (2022) Diffusionclip: text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2426–2435
Zhu P, Abdal R, Femiani J, Wonka P (2022) Mind the gap: domain gap control for single shot domain adaptation for generative adversarial networks
Tan Z, Chai M, Chen D, Liao J, Chu Q, Liu B, Hua G, Yu N (2021) Diverse semantic image synthesis via probability distribution modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7962–7971
Park S, Yoo C-H, Shin Y-G (2022) Effective shortcut technique for generative adversarial networks. Appl Intell 1–13
Ansari AF, Scarlett J, Soh H (2020) A characteristic function approach to deep implicit generative modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7478–7487
Tao S, Wang J (2020) Alleviation of gradient exploding in gans: fake can be real. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1191–1200
Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of gans for improved quality, stability, and variation. In: International conference on learning representations
Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410
Yang T, Ren P, Xie X, Zhang L (2021) GAN prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 672–681
Abdal R, Zhu P, Mitra NJ, Wonka P (2021) Styleflow: attribute-conditioned exploration of styleGAN-generated images using conditional continuous normalizing flows. ACM Trans Graph (TOG) 40(3):1–21
Article Google Scholar
Song G, Luo L, Liu J, Ma W-C, Lai C, Zheng C, Cham T-J (2021) agilegan: stylizing portraits by inversion-consistent transfer learning. ACM Trans Graph (TOG) 40(4):1–13
Article CAS Google Scholar
Alaluf Y, Tov O, Mokady R, Gal R, Bermano A (2022) Hyperstyle: stylegan inversion with hypernetworks for real image editing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18511–18521
Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D (2021) Styleclip: text-driven manipulation of styleGAN imagery. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2085–2094
Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2287–2296
Shen Y, Yang C, Tang X, Zhou B (2020) InterfaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans Pattern Anal Mach Intell
Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in GANs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1532–1540
Shi Y, Aggarwal D, Jain AK (2021) Lifting 2d stylegan for 3d-aware face generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6258–6266
Tewari A, Elgharib M, Bharaj G, Bernard F, Seidel H-P, Pérez P, Zollhofer M, Theobalt C (2020) Stylerig: rigging styleGAN for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6142–6151
Qiao T, Zhang J, Xu D, Tao D () Mirrorgan: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1505–1514
Gafni O, Polyak A, Ashual O, Sheynin S, Parikh D, Taigman Y (2022) Make-a-scene: scene-based text-to-image generation with human priors. In: European conference on computer vision, Springer, pp 89–106
Avrahami O, Hayes T, Gafni O, Gupta S, Taigman Y, Parikh D, Lischinski D, Fried O, Yin X (2023) spatext: spatio-textual representation for controllable image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18370–18380
Kim Y, Lee J, Kim J-H, Ha J-W, Zhu J-Y (2023) dense text-to-image generation with attention modulation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7701–7711
Wang Y, Wu C, Herranz L, van de Weijer J, Gonzalez-Garcia A, Raducanu B (2018) Transferring GANs: generating images from limited data. In: Proceedings of the European conference on computer vision (ECCV), pp 218–234
Ojha U, Li Y, Lu J, Efros AA, Lee YJ, Shechtman E, Zhang R (2021) Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10743–10752
Lin J, Pang Y, Xia Y, Chen Z, Luo J (2020) TuiGAN: learning versatile image-to-image translation with two unpaired images. In: European conference on computer vision, Springer, pp 18–35
Shaham TR, Dekel T, Michaeli T (2019) SinGAN: Learning a generative model from a single natural image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4570–4580
Kwon G, Ye JC (2023) One-shot adaptation of gan in just one clip. IEEE Trans Pattern Anal Mach Intell
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Choi Y, Uh Y, Yoo J, Ha J-W (2020) StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8188–8197
Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561
Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: Proceedings of the eighth international conference on learning representations (ICLR 2020)
Zhang M, Lucas J, Ba J, Hinton GE (2019) Lookahead optimizer: k steps forward, 1 step back. Adv Neural Inf Process Syst 32
Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for styleGAN image manipulation. ACM Trans Graph (TOG) 40(4):1–14
Article Google Scholar
Wang Z, Zhao L, Chen H, Zuo Z, Li A, Xing W, Lu D (2021) Evaluate and improve the quality of neural style transfer. Comput Vis Image Underst 207:103203
Article Google Scholar

Download references

Acknowledgements

This work is supported by National Science and Technology Foundation of China nos. 61271361, 61761046, 52102382 and 62061049; Key Project of Applied Basic Research Program of Yunnan Provincial Department of Science and Technology nos. 202001BB050043; Major Science and Technology Special Project in Yunnan Province no. 202002AD080001; Reserve talents of young and middle-aged academic and technical leaders in Yunnan Province no. 2019HB121

Author information

Authors and Affiliations

School of Information Science & Engineering, Yunnan University, Kunming, China
Siqi Li, Yuanyuan Pu, Zhengpeng Zhao, Qiuxia Yang, Jinjing Gu, Yupan Li & Dan Xu
The Universities Key Laboratory of Internet of Things Technology and Application in Yunnan, Kunming, 650500, China
Yuanyuan Pu

Authors

Siqi Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuanyuan Pu
View author publications
You can also search for this author in PubMed Google Scholar
Zhengpeng Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Qiuxia Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jinjing Gu
View author publications
You can also search for this author in PubMed Google Scholar
Yupan Li
View author publications
You can also search for this author in PubMed Google Scholar
Dan Xu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yuanyuan Pu.

Ethics declarations

Conflicts of interest

The authors have no relevant financial or non-financial interests to disclose. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, S., Pu, Y., Zhao, Z. et al. Dual-path hypernetworks of style and text for one-shot domain adaptation. Appl Intell 54, 2614–2630 (2024). https://doi.org/10.1007/s10489-023-05229-5

Download citation

Accepted: 07 December 2023
Published: 06 February 2024
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10489-023-05229-5

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-path hypernetworks of style and text for one-shot domain adaptation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models

A New StyleGAN Latent Space Based Model for Image Style Transfer

StyleAdapter: A Unified Stylized Image Generation Model

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now