Skip to main content

Advertisement

Log in

Dual-path hypernetworks of style and text for one-shot domain adaptation

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Learning a one-shot domain adaptation model is an exciting and challenging topic in computer vision and graphics. A feasible solution is to fine-tune a pre-trained generator to the target domain by leveraging the powerful semantic capabilities of CLIP (Contrastive Language-Image Pretraining). Unfortunately, when the target image shows a significant difference from the source domain, existing methods would result in overfitting, and generated images do not correctly reflect the texture of the target image. To address this issue, we propose a Dynamic Domain Transfer Strategy (DDTS) to align the texture information between the source and target domain by dynamically adjusting the direction of domain transfer. Furthermore, the delicately designed dual-path hypernetworks of style and text (Dual-HyperST) for one-shot domain adaptation characterize the target domain’s textual style and visual style with a text-guide path and a style-guide path. Specifically, the style-guided path predicts a set of style weight offsets by the target image, followed by the text-guided path predicts a set of text weight offsets by a text prompt. To better integrate the information between these two paths, we introduce a hypernetwork that learns to modulate the pre-trained generator instead of fine-tuning. Qualitative and quantitative experiments demonstrate the superiority of Dual-HyperST, which surpasses the state-of-the-art methods in the diversity and high quality of the generated images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27

  2. Zhou D, Zhang H, Li Q, Ma J, Xu X (2022) Coutfitgan: learning to synthesize compatible outfits supervised by silhouette masks and fashion styles. IEEE Trans Multimed

  3. Tang H, Liu H, Xu D, Torr PH, Sebe N (2021) Attentiongan: unpaired image-to-image translation using attention-guided generative adversarial networks. IEEE Trans Neural Netw Learn Syst

  4. Li X, Zhang S, Hu J, Cao L, Hong X, Mao X, Huang F, Wu Y, Ji R (2021) Image-to-image translation via hierarchical style disentanglement. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8639–8648

  5. Karras T, Laine S, Aittala M, Hellsten J, Lehtinen J, Aila T (2020) Analyzing and improving the image quality of style GAN. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8110–8119

  6. Yadav NK, Singh SK, Dubey SR (2022) Csa-gan: cyclic synthesized attention guided generative adversarial network for face synthesis. Appl Intell 52(11):12704–12723

    Article  Google Scholar 

  7. Zhang L, Long C, Yan Q, Zhang X, Xiao C (2020) Cla-GAN: a context and lightness aware generative adversarial network for shadow removal. In: Computer graphics forum vol 39. Wiley Online Library, pp 483–494

  8. Chen G, Zhang G, Yang Z, Liu W (2022) Multi-scale patch-gan with edge detection for image inpainting. Appl Intell 1–16

  9. Wang Y, Gonzalez-Garcia A, Berga D, Herranz L, Khan FS, Weijer Jvd (2020) Mine GAN: effective knowledge transfer from GANs to target domains with few images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9332–9341

  10. Mo S, Cho M, Shin J (2020) Freeze the discriminator: a simple baseline for fine-tuning gans. In: CVPR AI for content creation workshop

  11. Karras T, Aittala M, Hellsten J, Laine S, Lehtinen J, Aila T (2020) Training generative adversarial networks with limited data. Adv Neural Inf Process Syst 33:12104–12114

    Google Scholar 

  12. Tran N-T, Tran V-H, Nguyen N-B, Nguyen T-K, Cheung N-M (2021) On data augmentation for GAN training. IEEE Trans Image Process 30:1882–1897

    Article  ADS  MathSciNet  PubMed  Google Scholar 

  13. Radford A, Kim JW, Hallacy C, Ramesh A, Goh G, Agarwal S, Sastry G, Askell A, Mishkin P, Clark J et al. (2021) Learning transferable visual models from natural language supervision. In: International conference on machine learning, PMLR, pp 8748–8763

  14. Gal R, Patashnik O, Maron H, Bermano AH, Chechik G, Cohen-Or D (2022) Stylegan-nada: clip-guided domain adaptation of image generators. ACM Trans Graphics (TOG) 41(4):1–13

    Article  Google Scholar 

  15. Kim G, Kwon T, Ye JC (2022) Diffusionclip: text-guided diffusion models for robust image manipulation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2426–2435

  16. Zhu P, Abdal R, Femiani J, Wonka P (2022) Mind the gap: domain gap control for single shot domain adaptation for generative adversarial networks

  17. Tan Z, Chai M, Chen D, Liao J, Chu Q, Liu B, Hua G, Yu N (2021) Diverse semantic image synthesis via probability distribution modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7962–7971

  18. Park S, Yoo C-H, Shin Y-G (2022) Effective shortcut technique for generative adversarial networks. Appl Intell 1–13

  19. Ansari AF, Scarlett J, Soh H (2020) A characteristic function approach to deep implicit generative modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 7478–7487

  20. Tao S, Wang J (2020) Alleviation of gradient exploding in gans: fake can be real. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1191–1200

  21. Karras T, Aila T, Laine S, Lehtinen J (2018) Progressive growing of gans for improved quality, stability, and variation. In: International conference on learning representations

  22. Karras T, Laine S, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4401–4410

  23. Yang T, Ren P, Xie X, Zhang L (2021) GAN prior embedded network for blind face restoration in the wild. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 672–681

  24. Abdal R, Zhu P, Mitra NJ, Wonka P (2021) Styleflow: attribute-conditioned exploration of styleGAN-generated images using conditional continuous normalizing flows. ACM Trans Graph (TOG) 40(3):1–21

    Article  Google Scholar 

  25. Song G, Luo L, Liu J, Ma W-C, Lai C, Zheng C, Cham T-J (2021) agilegan: stylizing portraits by inversion-consistent transfer learning. ACM Trans Graph (TOG) 40(4):1–13

    Article  CAS  Google Scholar 

  26. Alaluf Y, Tov O, Mokady R, Gal R, Bermano A (2022) Hyperstyle: stylegan inversion with hypernetworks for real image editing. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18511–18521

  27. Patashnik O, Wu Z, Shechtman E, Cohen-Or D, Lischinski D (2021) Styleclip: text-driven manipulation of styleGAN imagery. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 2085–2094

  28. Richardson E, Alaluf Y, Patashnik O, Nitzan Y, Azar Y, Shapiro S, Cohen-Or D (2021) Encoding in style: a stylegan encoder for image-to-image translation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2287–2296

  29. Shen Y, Yang C, Tang X, Zhou B (2020) InterfaceGAN: interpreting the disentangled face representation learned by GANs. IEEE Trans Pattern Anal Mach Intell

  30. Shen Y, Zhou B (2021) Closed-form factorization of latent semantics in GANs. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1532–1540

  31. Shi Y, Aggarwal D, Jain AK (2021) Lifting 2d stylegan for 3d-aware face generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6258–6266

  32. Tewari A, Elgharib M, Bharaj G, Bernard F, Seidel H-P, Pérez P, Zollhofer M, Theobalt C (2020) Stylerig: rigging styleGAN for 3d control over portrait images. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6142–6151

  33. Qiao T, Zhang J, Xu D, Tao D () Mirrorgan: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1505–1514

  34. Gafni O, Polyak A, Ashual O, Sheynin S, Parikh D, Taigman Y (2022) Make-a-scene: scene-based text-to-image generation with human priors. In: European conference on computer vision, Springer, pp 89–106

  35. Avrahami O, Hayes T, Gafni O, Gupta S, Taigman Y, Parikh D, Lischinski D, Fried O, Yin X (2023) spatext: spatio-textual representation for controllable image generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 18370–18380

  36. Kim Y, Lee J, Kim J-H, Ha J-W, Zhu J-Y (2023) dense text-to-image generation with attention modulation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 7701–7711

  37. Wang Y, Wu C, Herranz L, van de Weijer J, Gonzalez-Garcia A, Raducanu B (2018) Transferring GANs: generating images from limited data. In: Proceedings of the European conference on computer vision (ECCV), pp 218–234

  38. Ojha U, Li Y, Lu J, Efros AA, Lee YJ, Shechtman E, Zhang R (2021) Few-shot image generation via cross-domain correspondence. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10743–10752

  39. Lin J, Pang Y, Xia Y, Chen Z, Luo J (2020) TuiGAN: learning versatile image-to-image translation with two unpaired images. In: European conference on computer vision, Springer, pp 18–35

  40. Shaham TR, Dekel T, Michaeli T (2019) SinGAN: Learning a generative model from a single natural image. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 4570–4580

  41. Kwon G, Ye JC (2023) One-shot adaptation of gan in just one clip. IEEE Trans Pattern Anal Mach Intell

  42. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  43. Choi Y, Uh Y, Yoo J, Ha J-W (2020) StarGAN v2: diverse image synthesis for multiple domains. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8188–8197

  44. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops, pp 554–561

  45. Liu L, Jiang H, He P, Chen W, Liu X, Gao J, Han J (2020) On the variance of the adaptive learning rate and beyond. In: Proceedings of the eighth international conference on learning representations (ICLR 2020)

  46. Zhang M, Lucas J, Ba J, Hinton GE (2019) Lookahead optimizer: k steps forward, 1 step back. Adv Neural Inf Process Syst 32

  47. Tov O, Alaluf Y, Nitzan Y, Patashnik O, Cohen-Or D (2021) Designing an encoder for styleGAN image manipulation. ACM Trans Graph (TOG) 40(4):1–14

    Article  Google Scholar 

  48. Wang Z, Zhao L, Chen H, Zuo Z, Li A, Xing W, Lu D (2021) Evaluate and improve the quality of neural style transfer. Comput Vis Image Underst 207:103203

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by National Science and Technology Foundation of China nos. 61271361, 61761046, 52102382 and 62061049; Key Project of Applied Basic Research Program of Yunnan Provincial Department of Science and Technology nos. 202001BB050043; Major Science and Technology Special Project in Yunnan Province no. 202002AD080001; Reserve talents of young and middle-aged academic and technical leaders in Yunnan Province no. 2019HB121

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuanyuan Pu.

Ethics declarations

Conflicts of interest

The authors have no relevant financial or non-financial interests to disclose. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. The authors have no financial or proprietary interests in any material discussed in this article.

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Pu, Y., Zhao, Z. et al. Dual-path hypernetworks of style and text for one-shot domain adaptation. Appl Intell 54, 2614–2630 (2024). https://doi.org/10.1007/s10489-023-05229-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-05229-5

Keywords