Abstract
Facial attribute transfer aims to transfer target facial attributes—such as beard, bangs and opening mouth—to a face without them in a source facial image while keeping non-target attributes of the face intact. Existing methods for facial attribute transfer are basically homogeneous images oriented, which focus on transferring target attributes to (or between) photorealistic facial images. In this paper, facial attribute transfer between heterogeneous images is addressed, which is a new and more challenging task. More specifically, we propose a bi-directional facial attribute transfer method based on GAN (generative adversarial network) and latent representation in a new way, for the instance based facial attribute transfer that aims to transfer a target facial attribute with its basic shape from a reference photorealistic facial image to a source realistic portrait illustration and vice versa (i.e., erasing the target attribute in the facial image). How to achieve visual style consistency of the transferred attribute in the heterogeneous result images and overcome information dimensionality imbalance between photorealistic facial images and realistic portrait illustrations are the key points in our work. We deal with content and visual style of an image separately in latent representation learning by the composite encoder designed with the architecture of convolutional neural network and fully connected neural network, which is different from previous latent representation based facial attribute transfer methods that mix content and visual style in a latent representation. The approach turns out to well preserve the visual style consistency. Besides, we introduce different multipliers for weights of loss items in our objective functions to balance information imbalance between heterogeneous images. Experiments show that our method is capable of achieving facial attribute transfer between heterogeneous images with good results. For purpose of quantitative analysis, FID scores of our method on a couple of datasets are also given to show its effectiveness.
Similar content being viewed by others
Abbreviations
- \(*\) :
-
The state with or without the target attribute. \(*\in \left\{ 0,1\right\}\), where 0 presents the state without the target attribute, and 1 presents the state with the target attribute
- \({\Theta }^*\) :
-
A sample with or without the target attribute
- \(a^{{\Theta }^*}\) :
-
The target attribute code of \({\Theta }^*\)
- \(z^{{\Theta }^*}\) :
-
The non-target attribute code of \({\Theta }^*\)
- \(c^{{\Theta }^*}\) :
-
The content code of \({\Theta }^*\), where \(c^{{\Theta }^*} = (a^{{\Theta }^*},z^{{\Theta }^*} )\)
- \(s^{{\Theta }^*}\) :
-
The style code of \({\Theta }^*\)
- \(D_{a^{{\Theta }^*}}\) :
-
The dimension value of \(a^{{\Theta }^*}\)
- \(x^0\) :
-
A photorealistic facial image without the target attribute
- \(x^1\) :
-
A photorealistic facial image with the target attribute
- \(y^0\) :
-
A realistic portrait illustration without the target attribute
- \(y^1\) :
-
A realistic portrait illustration with the target attribute
- \(x_{\mathrm{trans}}^0\) :
-
The result image of the edited photorealistic facial image
- \(y_{\mathrm{trans}}^1\) :
-
The result image of the edited realistic portrait illustration
- \(X^0\) :
-
The photorealistic facial image domain of facial images without the target attribute
- \(X^1\) :
-
The photorealistic facial image domain of facial images with the target attribute
- \(Y^0\) :
-
The realistic portrait illustration domain of portrait illustrations without the target attribute
- \(Y^1\) :
-
The realistic portrait illustration domain of portrait illustrations with the target attribute
- \(E_{X^*}^C\) :
-
The content encoder for \(x^*\)
- \(E_{X^*}^S\) :
-
The style encoder for \(x^*\)
- \(E_{X^*}\) :
-
The encoder to encode \(x^*\), where \(E_{X^*}=(E_{X^*}^C,E_{X^*}^S)\)
- \(E_{Y^*}^C\) :
-
The content encoder for \(y^*\)
- \(E_{Y^*}^S\) :
-
The style encoder for \(y^*\)
- \(E_{Y^*}\) :
-
The encoder to encode \(y^*\), where \(E_{Y^*}=(E_{Y^*}^C,E_{Y^*}^S)\)
- \(G_{X^*}\) :
-
The generator to generate \(x^*\)
- \(G_{Y^*}\) :
-
The generator to generate \(y^*\)
- \(D_{X^0}\) :
-
The discriminator to discriminate \(x^0\) and \(x_{\mathrm{trans}}^0\) images
- \(D_{Y^1}\) :
-
The discriminator to discriminate \(y^1\) and \(y_{\mathrm{trans}}^1\) images
- L :
-
The loss function
- \(\lambda\) :
-
The weight
- \(\alpha\) :
-
The multiplier for \(\lambda\)
References
Bengio Y, Mesnil G, Dauphin YN, Rifai S (2013) Better mixing via deep representations. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, JMLR workshop and conference proceedings, vol 28, pp 552–560. JMLR.org. http://proceedings.mlr.press/v28/bengio13.html
Choi Y, Choi M, Kim M, Ha J, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, 2018. IEEE Computer Society, pp 8789–8797. https://doi.org/10.1109/CVPR.2018.00916. http://openaccess.thecvf.com/content_cvpr_2018/html/Choi_StarGAN_Unified_Generative_CVPR_2018_paper.html
Dong Y, Liang T, Zhang Y, Du B (2021) Spectral-spatial weighted kernel manifold embedded distribution alignment for remote sensing image classification. IEEE Trans Cybern 51(6):3185–3197. https://doi.org/10.1109/TCYB.2020.3004263
He Z, Zuo W, Kan M, Shan S, Chen X (2019) Attgan: facial attribute editing by only changing what you want. IEEE Trans Image Process 28(11):5464–5478. https://doi.org/10.1109/TIP.2019.2916751
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, von Luxburg U, Bengio, S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 6626–6637. https://proceedings.neurips.cc/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html
Huang X, Belongie SJ (2017) Arbitrary style transfer in real-time with adaptive instance normalization. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22–29, 2017. IEEE Computer Society, pp 1510–1519. https://doi.org/10.1109/ICCV.2017.167
Huang X, Liu M, Belongie SJ, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Ferrari V, Hebert M, Sminchisescu C, Weiss Y (eds) Computer vision—ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, proceedings, Part III, Lecture notes in computer science, vol 11207. Springer, pp 179–196. https://doi.org/10.1007/978-3-030-01219-9_11
Isola P, Zhu J, Zhou T, Efros A (2017) A Image-to-image translation with conditional adversarial networks. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 5967–5976. https://doi.org/10.1109/CVPR.2017.632
Jing Y, Yang Y, Feng Z, Ye J, Yu Y, Song M (2020) Neural style transfer: a review. IEEE Trans Vis Comput Graph 26(11):3365–3385. https://doi.org/10.1109/TVCG.2019.2921336
Kim T, Kim B, Cha, M, Kim J (2017) Unsupervised visual attribute transfer with reconfigurable generative adversarial networks. CoRR arXiv:1707.09798
Lample G, Zeghidour N, Usunier N, Bordes A, Denoyer L, Ranzato M (2017) Fader networks: manipulating images by sliding attributes. In: Guyon I, von Luxburg U, Bengio S, Wallach HM, Fergus R, Vishwanathan SVN, Garnett R (eds) Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4–9, 2017, Long Beach, CA, USA, pp 5967–5976. https://proceedings.neurips.cc/paper/2017/hash/3fd60983292458bf7dee75f12d5e9e05-Abstract.html
Lee C, Liu Z, Wu L, Luo P (2020) Maskgan: towards diverse and interactive facial image manipulation. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020. IEEE, pp 5548–5557. https://doi.org/10.1109/CVPR42600.2020.00559
Li M, Zuo W, Zhang D (2016) Convolutional network for attribute-driven and identity-preserving human face generation. CoRR arXiv:1608.06434
Li M, Zuo W, Zhang D (2016) Deep identity-aware transfer of facial attributes. CoRR arXiv:1610.05586
Li X, Zhang S, Hu J, Cao L, Hong X, Mao X, Huang F, Wu Y, Ji R (2021) Image-to-image translation via hierarchical style disentanglement. CoRR arXiv:2103.01456
Lin Y, Wu P, Chang C, Chang EY, Liao S (2019) Relgan: multi-domain image-to-image translation via relative attributes. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, 2019. IEEE, pp 5913–5921. https://doi.org/10.1109/ICCV.2019.00601
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: 2015 IEEE international conference on computer vision, ICCV 2015, Santiago, Chile, December 7–13, 2015. IEEE Computer Society, pp 3730–3738. https://doi.org/10.1109/ICCV.2015.425
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: NIPS-W
Perarnau G, van de Weijer J, Raducanu B, Álvarez JM (2016) Invertible conditional GANs for image editing. CoRR arXiv:1611.06355
Shen W, Liu R (2017) Learning residual images for face attribute manipulation. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 1225–1233. https://doi.org/10.1109/CVPR.2017.135
Shi R, Ye D, Chen Z (2020) Gradient controlled and discriminative features guided image-to-image translation method towards realistic portrait illustrations. Pattern Recognit Artif Intell 33(11):959. https://doi.org/10.16451/j.cnki.issn1003-6059.202011001 (in Chinese)
Upchurch P, Gardner JR, Pleiss G, Pless R, Snavely N, Bala K, Weinberger KQ (2017) Deep feature interpolation for image content changes. In: 2017 IEEE conference on computer vision and pattern recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, 2017. IEEE Computer Society, pp 6090–6099. https://doi.org/10.1109/CVPR.2017.645
Xiao T, Hong J, Ma J (2018) DNA-GAN: learning disentangled representations from multi-attribute images. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, workshop track proceedings. OpenReview.net. https://openreview.net/forum?id=rkX1FF_UM
Zhao C, Wang X, Zuo W, Shen F, Shao L, Miao D (2020) Similarity learning with joint transfer constraints for person re-identification. Pattern Recognit. https://doi.org/10.1016/j.patcog.2019.107014
Zhou S, Xiao T, Yang Y, Feng D, He Q, He W (2017) Genegan: learning object transfiguration and attribute subspace from unpaired data. CoRR arXiv:1705.04932
Zhu J, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: IEEE international conference on computer vision, ICCV 2017, Venice, Italy, October 22–29, 2017. IEEE Computer Society, pp 2242–2251. https://doi.org/10.1109/ICCV.2017.244
Funding
This work was supported by National Natural Science Foundation of China [No. 61672158] and Natural Science Foundation of Fujian Province [No. 2018J01798].
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shi, Rx., Ye, Dy. & Chen, Zj. A bi-directional facial attribute transfer framework: transfer your single facial attribute to a portrait illustration. Neural Comput & Applic 34, 253–270 (2022). https://doi.org/10.1007/s00521-021-06360-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06360-5