Skip to main content

Style Image Harmonization via Global-Local Style Mutual Guided

  • Conference paper
  • First Online:
Computer Vision – ACCV 2022 (ACCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13847))

Included in the following conference series:

Abstract

The process of style image harmonization is attaching an area of the source image to the target style image to form a harmonious new image. Existing methods generally have problems such as distorted foreground, missing content, and semantic inconsistencies caused by the excessive transfer of local style. In this paper, we present a framework for style image harmonization via global and local styles mutual guided to ameliorate these problems. Specifically, we learn to extract global and local information from the Vision Transformer and Convolutional Neural Networks, and adaptively fuse the two kinds of information under a multi-scale fusion structure to ameliorate disharmony between foreground and background styles. Then we train the blending network GradGAN to smooth the image gradient. Finally, we take both style and gradient into consideration to solve the sudden change in the blended boundary gradient. In addition, supervision is unnecessary in our training process. Our experimental results show that our algorithm can balance global and local styles in the foreground stylization, retaining the original information of the object while keeping the boundary gradient smooth, which is more advanced than other methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Xue, S., Agarwala, A., Dorsey, J., Rushmeier, H.: Understanding and improving the realism of image composites. ACM Trans. Graph. 31, 1–10 (2012)

    Article  Google Scholar 

  2. Wu, H., Zheng, S., Zhang, J., Huang, K.: GP-GAN: towards realistic high-resolution image blending. In: ACM International Conference on Multimedia, pp. 2487–2495 (2019)

    Google Scholar 

  3. Goodfellow, I., Pouget-Abadie, J., et al.: Generative adversarial nets. In: Advances in Neural Information Processing Systems, vol. 27 (2014)

    Google Scholar 

  4. Zhang, L., Wen, T., Shi, J.: Deep image blending. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 231–240 (2020)

    Google Scholar 

  5. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2414–2423 (2016)

    Google Scholar 

  6. Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE International Conference on Computer Vision (ICCV), pp. 1501–1510 (2017)

    Google Scholar 

  7. Li, Y., Fang, C., et al.: Universal style transfer via feature transforms. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  8. Liu, S., Lin, T., et al.: AdaAttN: revisit attention mechanism in arbitrary neural style transfer. In: IEEE International Conference on Computer Vision (ICCV), pp. 6649–6658 (2021)

    Google Scholar 

  9. Deng, Y., Tang, F., et al.: StyTr\(^{2}\): image style transfer with transformers. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11326–11336 (2022)

    Google Scholar 

  10. Cong, W., Niu, L., Zhang, J., Liang, J., Zhang, L.: Bargainnet: background-guided domain translation for image harmonization. In: IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6 (2021)

    Google Scholar 

  11. Cong, W., Niu, L., et al.: Dovenet: deep image harmonization via domain verification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8394–8403 (2020)

    Google Scholar 

  12. Sofiiuk, K., Popenova, P., Konushin, A.: Foreground-aware semantic representations for image harmonization. In: IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 1620–1629 (2021)

    Google Scholar 

  13. Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arXiv preprint (2019)

    Google Scholar 

  14. Jing, Y., Liu, X., et al.: Dynamic instance normalization for arbitrary style transfer. In: AAAI Conference on Artificial Intelligence, vol. 34, pp. 4369–4376 (2020)

    Google Scholar 

  15. An, J., Huang, S., et al.: Unbiased image style transfer via reversible neural flows. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 862–871 (2021)

    Google Scholar 

  16. Pitie, F., Kokaram, A.C., Dahyot, R.: N-dimensional probability density function transfer and its application to color transfer. In: IEEE International Conference on Computer Vision (ICCV), pp. 1434–1439 (2005)

    Google Scholar 

  17. Reinhard, E., Adhikhmin, M., Gooch, B., Shirley, P.: Color transfer between images. IEEE Comput. Graph. Appl. 21, 34–41 (2001)

    Article  Google Scholar 

  18. Sengupta, A., Ye, Y., et al.: Going deeper in spiking neural networks: VGG and residual architectures. Front. Neurosci. 13, 95 (2019)

    Article  Google Scholar 

  19. Xia, X., et al.: Joint bilateral learning for real-time universal photorealistic style transfer. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12353, pp. 327–342. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58598-3_20

    Chapter  Google Scholar 

  20. Gu, J., Ye, J.C.: AdaIN-based tunable CycleGAN for efficient unsupervised low-dose CT denoising. IEEE Trans. Comput. Imaging 7, 73–85 (2021)

    Article  Google Scholar 

  21. Karras, T., Laine, S., et al.: Analyzing and improving the image quality of StyleGAN. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8110–8119 (2020)

    Google Scholar 

  22. Dosovitskiy, A., Beyer, L., et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint (2020)

    Google Scholar 

  23. Yuan, L., Chen, Y., et al.: Tokens-to-token ViT: training vision transformers from scratch on imagenet. In: IEEE International Conference on Computer Vision (ICCV), pp. 558–567 (2021)

    Google Scholar 

  24. Arnab, A., Dehghani, M., et al.: ViViT: a video vision transformer. In: IEEE International Conference on Computer Vision (ICCV), pp. 6836–6846 (2021)

    Google Scholar 

  25. Wang, W., Xie, E., et al.: PVT v2: improved baselines with pyramid vision transformer. Comput. Vis. Media 8, 1–10 (2022)

    Google Scholar 

  26. Zhang, P., Dai, X., et al.: Multi-scale vision longformer: a new vision transformer for high-resolution image encoding. In: IEEE International Conference on Computer Vision (ICCV), pp. 2998–3008 (2021)

    Google Scholar 

  27. Grundland, M., Vohra, R., et al.: Cross dissolve without cross fade: preserving contrast, color and salience in image compositing. In: Computer Graphics Forum, vol. 25, pp. 557–586 (2006)

    Google Scholar 

  28. Sunkavalli, K., Johnson, M.K., et al.: Multi-scale image harmonization. ACM Trans. Graph. (TOG) 29, 1–10 (2010)

    Article  Google Scholar 

  29. Tao, M.W., Johnson, M.K., Paris, S.: Error-tolerant image compositing. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6311, pp. 31–44. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15549-9_3

    Chapter  Google Scholar 

  30. Jia, J., Sun, J., et al.: Drag-and-drop pasting. ACM Trans. Graph. (TOG) 25, 631–637 (2006)

    Article  Google Scholar 

  31. Porter, T., Duff, T.: Compositing digital images. In: Annual Conference on Computer Graphics and Interactive Techniques, pp. 253–259 (1984)

    Google Scholar 

  32. Fattal, R., Lischinski, D., Werman, M.: Gradient domain high dynamic range compression. In: Annual Conference on Computer Graphics and Interactive Techniques, pp. 249–256 (2002)

    Google Scholar 

  33. Levin, A., Zomet, A., Peleg, S., Weiss, Y.: Seamless image stitching in the gradient domain. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3024, pp. 377–389. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24673-2_31

    Chapter  Google Scholar 

  34. Szeliski, R., Uyttendaele, M., et al.: Fast poisson blending using multi-splines. In: IEEE International Conference on Computational Photography (ICCP), pp. 1–8 (2011)

    Google Scholar 

  35. Pérez, P., Gangnet, M., et al.: Poisson image editing. In: ACM SIGGRAPH 2003 Papers, pp. 313–318 (2003)

    Google Scholar 

  36. Ling, J., Xue, H., et al.: Region-aware adaptive instance normalization for image harmonization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9361–9370 (2021)

    Google Scholar 

  37. Guo, Z., Zheng, H., et al.: Intrinsic image harmonization. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 16367–16376 (2021)

    Google Scholar 

  38. Luan, F., Paris, S., et al.: Deep painterly harmonization. In: Computer Graphics Forum, vol. 37, pp. 95–106 (2018)

    Google Scholar 

  39. Jiang, Y., Zhang, H., et al.: SSH: a self-supervised framework for image harmonization. In: IEEE International Conference on Computer Vision (ICCV), pp. 4832–4841 (2021)

    Google Scholar 

  40. Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4037–40581 (2020)

    Article  Google Scholar 

  41. Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9906, pp. 694–711. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46475-6_43

    Chapter  Google Scholar 

  42. Zhao, H., Gallo, O., et al.: Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 3, 47–40581 (2016)

    Article  Google Scholar 

  43. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint (2014)

    Google Scholar 

  44. Xiong, R., Yang, Y., et al.: On layer normalization in the transformer architecture. In: International Conference on Machine Learning (PMLR), pp. 10524–10533 (2020)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sanyuan Zhang .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 18948 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yan, X., Lu, Y., Shuai, J., Zhang, S. (2023). Style Image Harmonization via Global-Local Style Mutual Guided. In: Wang, L., Gall, J., Chin, TJ., Sato, I., Chellappa, R. (eds) Computer Vision – ACCV 2022. ACCV 2022. Lecture Notes in Computer Science, vol 13847. Springer, Cham. https://doi.org/10.1007/978-3-031-26293-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-26293-7_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-26292-0

  • Online ISBN: 978-3-031-26293-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics