Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework

Zhao, Yan; Zheng, Ziqiang; Wang, Chao; Gu, Zhaorui; Fu, Min; Yu, Zhibin; Zheng, Haiyong; Wang, Nan; Zheng, Bing

doi:10.1007/s11042-019-08346-x

Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework

Published: 09 January 2020

Volume 79, pages 14981–15000, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yan Zhao¹,
Ziqiang Zheng¹,
Chao Wang¹,
Zhaorui Gu¹,
Min Fu¹,
Zhibin Yu ORCID: orcid.org/0000-0003-4372-1767¹,
Haiyong Zheng¹,
Nan Wang¹ &
…
Bing Zheng¹

628 Accesses
4 Citations
Explore all metrics

Abstract

Fine-grained feature detection and recognition is an important but tough work due to the resolution and noisy representation. Synthesize images with a specified tiny feature is even more challenging. Existing image-to-image generation studies usually focus on improving image generation resolution and increasing the representation learning abilities under coarse features. However, generating images with fine-grained attributes under an image-to-image framework is still a tough work. In this paper, we propose an attention based pipeline generative adversarial network (Atten-Pip-GAN) to generate various facial images under multi-label fine-grained attributes with only a neutral facial image. First, we use a pipeline adversarial structure to generate images with multiple features step by step. Second, we use an independent image-to-image framework as a prepossessing method to detection the small fine-grained features and provide an attention map to improve the generation performance of delicate features. Third, we also propose an attention-based location loss to improve the generated performance on small fine-grained features. We apply this method to an open facial image database RaFD and demonstrate the efficiency of Atten-Pip-GAN on generating fine-grained attribute facial images.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SemiStarGAN: Semi-supervised Generative Adversarial Networks for Multi-domain Image-to-Image Translation

Modular Generative Adversarial Networks

Multi-scale Generative Adversarial Learning for Facial Attribute Transfer

References

Babenko A, Lempitsky V (2015) Aggregating local deep features for image retrieval. In: Proceedings of the IEEE international conference on computer vision, pp 1269–1277
Bau D, Zhu J-Y, Strobelt H, Zhou B, Tenenbaum JB, Freeman WT, Torralba A (2018) Gan dissection: visualizing and understanding generative adversarial networks, arXiv:1811.10597
Brock A, Donahue J, Simonyan K (2018) Large scale gan training for high fidelity natural image synthesis, arXiv:1809.11096
Calvo MG, Lundqvist D (2008) Facial expressions of emotion (kdef): identification under different display-duration conditions. Behavior Res Methods 40(1):109–115
Article Google Scholar
Che T, Li Y, Jacob AP, Bengio Y, Li W (2016) Mode regularized generative adversarial networks, arXiv:1612.02136
Choi Y, Choi M, Kim M, Ha J-W, Kim S, Choo J (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation, arXiv preprint
Cong D, Zhou Q, Cheng J, Wu X, Zhang S, Ou W, Lu H (2019) Can: contextual aggregating network for semantic segmentation. In: ICASSP 2019 - 2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 1892–1896, DOI https://doi.org/10.1109/ICASSP.2019.8683673
Elfenbein HA, Ambady N (2002) On the universality and cultural specificity of emotion recognition: a meta-analysis. Psychol Bulletin 128(2):203
Article Google Scholar
Fu J, Zheng H, Mei T (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4438–4446
Gao Y, Beijbom O, Zhang N, Darrell T (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
Ge Z, Bewley A, McCool C, Corke P, Upcroft B, Sanderson C (2016) Fine-grained classification via mixture of deep convolutional neural networks. In: 2016 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 1–6
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Gulrajani I, Ahmed F, Arjovsky M, Dumoulin V, Courville A (2017) Improved training of wasserstein gans, arXiv:1704.00028
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Guyon I, Luxburg UV, Bengio S, Wallach H, Fergus R, Vishwanathan S, Garnett R (eds) Advances in neural information processing systems, 30, Curran Associates, Inc., pp 6626–6637
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang GB, Ramesh M, Berg T, Learned-Miller E (2007) Labeled faces in the wild: a database for studying face recognition in unconstrained environments, Tech. Rep. 07-49, University of Massachusetts, Amherst
Huang H, Yu PS, Wang C (2018) An introduction to image synthesis with generative adversarial nets, arXiv:1803.04469
Huang X, Liu M-Y, Belongie S, Kautz J (2018) Multimodal unsupervised image-to-image translation. In: Proceedings of the European conference on computer vision (ECCV), pp 172–189
Isola P, Zhu J-Y, Zhou T, Efros AA (2016) Image-to-image translation with conditional adversarial networks, arXiv:1611.07004
Itseez (2015) Open source computer vision library, https://github.com/itseez/opencv
Kan M, Shan S, Chang H, Chen X (2014) Stacked progressive auto-encoders (spae) for face recognition across poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1883–1890
Karras T, Aila T, Laine S, Lehtinen J (2017) Progressive growing of gans for improved quality, stability, and variation, arXiv:1710.10196
Kim T, Cha M, Kim H, Lee JK, Kim J (2017) Learning to discover cross-domain relations with generative adversarial networks. In: Proceedings of the 34th international conference on machine learning. JMLR. org, vol 70, pp 1857–1865
King DE (2009) Dlib-ml: a machine learning toolkit. J Mach Learn Res 10 (Jul):1755–1758
Google Scholar
Kingma DP, Ba J (2015) Adam: a method for stochastic optimization. CoRR arXiv:1412.6980
Kingma DP, Welling M (2013) Auto-encoding variational bayes, arXiv:1312.6114
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Langner O, Dotsch R, Bijlstra G, Wigboldus D, Hawk S, van Knippenberg A (2010) Presentation and validation of the radboud faces database. Cognition and Emotion 24(8):1377–1388
Article Google Scholar
Ledig C, Theis L, Huszár F, Caballero J, Cunningham A, Acosta A, Aitken A, Tejani A, Totz J, Wang Z, Shi W (2017) Photo-realistic single image super-resolution using a generative adversarial network. In: CVPR, pp 4681–4690
Lee H-S, Kim D (2006) Generating frontal view face image for pose invariant face recognition. Pattern Recogn Lett 27(7):747–754
Article Google Scholar
Lee H-Y, Tseng H-Y, Huang J-B, Singh M, Yang M-H (2018) Diverse image-to-image translation via disentangled representations. In: Proceedings of the European conference on computer vision (ECCV), pp 35–51
Lin T-Y, Maji S (2017) Improved bilinear pooling with cnns, arXiv:1707.06772
Lin T-Y, RoyChowdhury A, Maji S (2015) Bilinear cnn models for fine-grained visual recognition. In: Proceedings of the IEEE international conference on computer vision, pp 1449–1457
Liu M-Y, Breuel T, Kautz J (2017) Unsupervised image-to-image translation networks. In: Advances in neural information processing systems, pp 700–708
Liu X, Xia T, Wang J, Yang Y, Zhou F, Lin Y (2016) Fully convolutional attention networks for fine-grained recognition, arXiv:1603.06765
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Article Google Scholar
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2017) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Int Things J 5(4):2315–2322
Article Google Scholar
Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Futur Gener Comput Syst 82:142–148
Article Google Scholar
Lu H, Wang D, Li Y, Li J, Li X, Kim H, Serikawa S, Humar I (2019) Conet: a cognitive ocean network, arXiv:1901.06253
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv:1411.1784
Ou W, Luan X, Gou J, Zhou Q, Xiao W, Xiong X, Zeng W (2018) Robust discriminative nonnegative dictionary learning for occluded face recognition. Pattern Recogn Lett 107:41–49. video Surveillance-oriented Biometrics. https://doi.org/10.1016/j.patrec.2017.07.006. http://www.sciencedirect.com/science/article/pii/S0167865517302386
Article Google Scholar
Ou W, Xuan R, Gou J, Zhou Q, Cao Y (2019) Semantic consistent adversarial cross-modal retrieval exploiting semantic similarity. Multimed Tools Appl. https://doi.org/10.1007/s11042-019-7343-8
Razavian AS, Azizpour H, Sullivan J, Carlsson S (2014) Cnn features off-the-shelf: an astounding baseline for recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 806–813
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Serikawa S, Lu H (2014) Underwater image dehazing using joint trilateral filter. Comput Electrical Eng 40(1):41–50
Article Google Scholar
Sicre R, Jurie F (2015) Discriminative part model for visual recognition. Comput Vis Image Underst 141:28–37
Article Google Scholar
Wang T-C, Liu M-Y, Zhu J-Y, Tao A, Kautz J, Catanzaro B (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8798–8807
Wang Z, Wang X, Wang G (2018) Learning fine-grained features via a cnn tree for large-scale classification. Neurocomputing 275:1231–1240
Article Google Scholar
Xiao T, Xu Y, Yang K, Zhang J, Peng Y, Zhang Z (2015) The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 842–850
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1316–1324
Yang L, Luo P, Loy CC, Tang X (2015) A large-scale car dataset for fine-grained categorization and verification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3973–3981
Yeh RA, Chen C, Lim TY, Schwing AG, Hasegawa-Johnson M, Do MN (2017) Semantic image inpainting with deep generative models. In: CVPR, pp 5485–5493
Yu C, Zhao X, Zheng Q, Zhang P, You X (2018) Hierarchical bilinear pooling for fine-grained visual recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 574–589
Yu S, Wu Y, Li W, Song Z, Zeng W (2017) A model for fine-grained vehicle classification based on deep learning. Neurocomputing 257:97–103
Article Google Scholar
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks, arXiv:1805.08318
Zhang H, Sindagi V, Patel VM (2017) Image de-raining using a conditional generative adversarial network, arXiv:1701.05957
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas D (2017) Stackgan++: realistic image synthesis with stacked generative adversarial networks, arXiv:1710.10916
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zheng Z, Yu Z, Zheng H, Wang C, Wang N (2017) Pipeline generative adversarial networks for facial images generation with multiple attributes, arXiv:1711.10742
Zhou Q, Yang W, Gao G, Ou W, Lu H, Chen J, Latecki LJ (2019) Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 22(2):555–570. https://doi.org/10.1007/s11280-018-0556-3
Article Google Scholar
Zhu J-Y, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, Shechtman E (2017) Toward multimodal image-to-image translation. In: Advances in neural information processing systems, pp 465–476
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, arXiv:1703.10593

Download references

Acknowledgments

This work was funded by the National Natural Science Foundation of China under Grant Number 61701463, the Natural Science Foundation of Shandong Province of China under Grant Number ZR2017BF011, the Fundamental Research Funds for the Central Universities under Grant Numbers 201822014.

Author information

Authors and Affiliations

Ocean University of China, No. 238, Songling Road, Qingdao, Shandong, China
Yan Zhao, Ziqiang Zheng, Chao Wang, Zhaorui Gu, Min Fu, Zhibin Yu, Haiyong Zheng, Nan Wang & Bing Zheng

Authors

Yan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Ziqiang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Chao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhaorui Gu
View author publications
You can also search for this author in PubMed Google Scholar
Min Fu
View author publications
You can also search for this author in PubMed Google Scholar
Zhibin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Haiyong Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Nan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Bing Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhibin Yu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, Y., Zheng, Z., Wang, C. et al. Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework. Multimed Tools Appl 79, 14981–15000 (2020). https://doi.org/10.1007/s11042-019-08346-x

Download citation

Received: 30 May 2019
Revised: 22 July 2019
Accepted: 01 October 2019
Published: 09 January 2020
Issue Date: June 2020
DOI: https://doi.org/10.1007/s11042-019-08346-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework

Abstract

Access this article

Similar content being viewed by others

SemiStarGAN: Semi-supervised Generative Adversarial Networks for Multi-domain Image-to-Image Translation

Modular Generative Adversarial Networks

Multi-scale Generative Adversarial Learning for Facial Attribute Transfer

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Fine-grained facial image-to-image translation with an attention based pipeline generative adversarial framework

Abstract

Access this article

Similar content being viewed by others

SemiStarGAN: Semi-supervised Generative Adversarial Networks for Multi-domain Image-to-Image Translation

Modular Generative Adversarial Networks

Multi-scale Generative Adversarial Learning for Facial Attribute Transfer

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation