Abstract
The goal of semantic image editing is to modify an image based on an input semantic label map, to carry out the necessary image manipulation. Existing approaches typically lack control over the style of the editing, resulting in insufficient flexibility to support the desired level of customization, e.g., to turn an object into a particular style or to pick a specific instance. In this work, we propose Style Adaptive Semantic Image Editing (SASIE), where a reference image is used as an additional input about style, to guide the image manipulation process in a more adaptive manner. Moreover, we propose a new transformer-based architecture for SASIE, in which intra-/inter-image multi-head self-attention blocks transfer intra-/inter-knowledge. The content of the edited areas is synthesized according to the given semantic label, while the style of the edited areas is inherited from the reference image. Extensive experiments on multiple datasets suggest that our method is highly effective and enables customizable image manipulation.
E. Günther and R. Gong—These authors contributed equally to this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
An, J., Huang, S., Song, Y., Dou, D., Liu, W., Luo, J.: ArtFlow: unbiased image style transfer via reversible neural flows. In: CVPR (2021)
Bau, D., et al.: Semantic photo manipulation with a generative image prior. ACM TOG 38(4) (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Cheng, J., Jaiswal, A., Wu, Y., Natarajan, P., Natarajan, P.: Style-aware normalized loss for improving arbitrary style transfer. In: CVPR (2021)
Collins, E., Bala, R., Price, B., Susstrunk, S.: Editing in style: uncovering the local semantics of GANs. In: CVPR (2020)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding. In: CVPR (2016)
Deng, Y., Tang, F., Pan, X., Dong, W., Ma, C., Xu, C.: StyTr\(^2\): unbiased image style transfer with transformers. arXiv preprint arXiv:2105.14576 (2021)
Dhamo, H., et al.: Semantic image manipulation using scene graphs. In: CVPR (2020)
Dosovitskiy, A., et al.: An image is worth \(16 \times 16\) words: transformers for image recognition at scale. In: ICLR (2021)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: CVPR (2016)
Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hong, S., Yan, X., Huang, T.E., Lee, H.: Learning hierarchical semantic image manipulation through structured representations. In: NeurIPS (2018)
Huang, X., Belongie, S.: Arbitrary style transfer in real-time with adaptive instance normalization. In: ICCV (2017)
Huang, X., Liu, M.-Y., Belongie, S., Kautz, J.: Multimodal unsupervised image-to-image translation. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 179–196. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_11
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Karras, T., et al.: Alias-free generative adversarial networks. In: NeurIPS (2021)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: CVPR (2020)
Kurzman, L., Vazquez, D., Laradji, I.: Class-based styling: real-time localized style transfer with semantic segmentation. In: ICCVW (2019)
Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: CVPR (2020)
Li, Y., Liu, M.-Y., Li, X., Yang, M.-H., Kautz, J.: A closed-form solution to photorealistic image stylization. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 468–483. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_28
Lim, J.H., Ye, J.C.: Geometric GAN. arXiv preprint arXiv:1705.02894 (2017)
Ling, H., Kreis, K., Li, D., Kim, S.W., Torralba, A., Fidler, S.: EditGAN: high-precision semantic image editing. In: NeurIPS (2021)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: ICCV (2021)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: ICCV (2015)
Miyato, T., Kataoka, T., Koyama, M., Yoshida, Y.: Spectral normalization for generative adversarial networks. In: ICLR (2018)
Ntavelis, E., Romero, A., Kastanis, I., Van Gool, L., Timofte, R.: SESAME: semantic editing of scenes by adding, manipulating or erasing objects. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12367, pp. 394–411. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58542-6_24
Park, T., Liu, M.Y., Wang, T.C., Zhu, J.Y.: Semantic image synthesis with spatially-adaptive normalization. In: CVPR (2019)
Park, T., et al.: Swapping autoencoder for deep image manipulation. In: NeurIPS (2020)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR (2015)
Strudel, R., Garcia, R., Laptev, I., Schmid, C.: Segmenter: transformer for semantic segmentation. In: ICCV (2021)
Svoboda, J., Anoosheh, A., Osendorfer, C., Masci, J.: Two-stage peer-regularized feature recombination for arbitrary image style transfer. In: CVPR (2020)
Tran, D., Ranganath, R., Blei, D.M.: Hierarchical implicit models and likelihood-free variational inference. In: NeurIPS (2017)
Vaswani, A., et al.: Attention is all you need. In: NeurIPS (2017)
Wang, T.C., Liu, M.Y., Zhu, J.Y., Tao, A., Kautz, J., Catanzaro, B.: High-resolution image synthesis and semantic manipulation with conditional GANs. In: CVPR (2018)
Xie, E., Wang, W., Yu, Z., Anandkumar, A., Alvarez, J.M., Luo, P.: SegFormer: simple and efficient design for semantic segmentation with transformers. In: NeurIPS (2021)
Yoo, J., Uh, Y., Chun, S., Kang, B., Ha, J.W.: Photorealistic style transfer via wavelet transforms. In: ICCV (2019)
Zhang, R., et al.: Real-time user-guided image colorization with learned deep priors. ACM TOG 9(4) (2017)
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing through ade20k dataset. In: CVPR (2017)
Zhou, B., et al.: Semantic understanding of scenes through the ADE20K dataset. IJCV 127(3), 302–321 (2019)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 597–613. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46454-1_36
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Zhu, P., Abdal, R., Qin, Y., Wonka, P.: Sean: image synthesis with semantic region-adaptive normalization. In: CVPR (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Günther, E., Gong, R., Van Gool, L. (2023). Style Adaptive Semantic Image Editing with Transformers. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13802. Springer, Cham. https://doi.org/10.1007/978-3-031-25063-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-25063-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25062-0
Online ISBN: 978-3-031-25063-7
eBook Packages: Computer ScienceComputer Science (R0)