Unsupervised Object Transfiguration with Attention

Ye, Zihan; Lyu, Fan; Li, Linyan; Sun, Yu; Fu, Qiming; Hu, Fuyuan

doi:10.1007/s12559-019-09633-3

Unsupervised Object Transfiguration with Attention

Published: 08 April 2019

Volume 11, pages 869–878, (2019)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Zihan Ye¹^na1,
Fan Lyu²^na1,
Linyan Li³,
Yu Sun¹,
Qiming Fu¹ &
…
Fuyuan Hu¹

432 Accesses
8 Citations
Explore all metrics

Abstract

Object transfiguration is a subtask of the image-to-image translation, which translates two independent image sets and has a wide range of applications. Recently, some studies based on Generative Adversarial Network (GAN) have achieved impressive results in the image-to-image translation. However, the object transfiguration task only translates regions containing target objects instead of whole images; most of the existing methods never consider this issue, which results in mistranslation on the backgrounds of images. To address this problem, we present a novel pipeline called Deep Attention Unit Generative Adversarial Networks (DAU-GAN). During the translating process, the DAU computes attention masks that point out where the target objects are. DAU makes GAN concentrate on translating target objects while ignoring meaningless backgrounds. Additionally, we construct an attention-consistent loss and a background-consistent loss to compel our model to translate intently target objects and preserve backgrounds further effectively. We have comparison experiments on three popular related datasets, demonstrating that the DAU-GAN achieves superior performance to the state-of-the-art. We also export attention masks in different stages to confirm its effect during the object transfiguration task. The proposed DAU-GAN can translate object effectively as well as preserve backgrounds information at the same time. In our model, DAU learns to focus on the most important information by producing attention masks. These masks compel DAU-GAN to effectively distinguish target objects and backgrounds during the translation process and to achieve impressive translation results in two subsets of ImageNet and CelebA. Moreover, the results show that we cannot only investigate the model from the image itself but also research from other modal information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DAU-GAN: Unsupervised Object Transfiguration via Deep Attention Unit

OSAGGAN: one-shot unsupervised image-to-image translation using attention-guided generative adversarial networks

Article 27 April 2023

AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Johnson J, Alahi A, Fei-Fei L. Perceptual losses for real-time style transfer and super-resolution. ECCV. 2016:694–711.
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. CVPR. 2017:1125–34.
Ledig C, Theis L, Huszar F, Caballero J, Cunningham A, Acosta A, et al. Photo-realistic single image superresolution using a generative adversarial network. CVPR. 2017:4681–90.
Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, et al. Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. CVPR. 2017:5907–15.
Feng Y, Ren J, Jiang J. Object-based 2d-to-3d video conversion for effective stereoscopic content generation in 3d-tv applications. IEEE Trans Broadcast. 2011;57(2):500–9.
Article Google Scholar
Ren J, Jiang J, Wang D, Ipson S. Fusion of intensity and inter-component chromatic difference for effective and robust colour edge detection. IET Image Process. 2010;4(4):294–301.
Article Google Scholar
Zabalza J, et al. Novel segemented stacked autoencoder for effective dimensionality reduction and feature extraction in hyperspectral imaging. Neurocomputing. 2016;185:1–10.
Article Google Scholar
Han J, Zhang D, Hu X, Guo L, Ren J, Wu F. Background prior-based salient object detection via deep reconstruction residual. TCSVT. 2015;25(8):1309–21.
Google Scholar
Yan Y, Ren J, Zhao H, Sun G, Wang Z, Zheng J, et al. Cognitive fusion of thermal and visible imagery for effective detection and tracking of pedestrians in videos. Cogn Comput. 2018;10(1):94–104.
Article Google Scholar
Han J, Zhang D, Cheng G, Guo L, Ren J. Object detection in optical remote sensing images based on weakly supervised learning and high-level feature learning. IEEE Trans Geosci Remote Sens. 2015;53(6):3325–37.
Article Google Scholar
Gao F, Zhang Y, Wang J, Sun J, Yang E, Hussain A. Visual attention model based vehicle target detection in synthetic aperture radar images: a novel approach. Cogn Comput. 2015;7(4):434–44.
Article Google Scholar
Gao F, You J, Wang J, Sun J, Yang E, Zhou H. A novel target detection method for SAR images based on shadow proposal and saliency analysis. Neurocomputing. 2017;267:220–31.
Article Google Scholar
Gao F, Ma F, Wang J, et al. Visual saliency modeling for river detection in high-resolution SAR imagery. IEEE Access. 2018;6:1000–14.
Article Google Scholar
Gao F, Ma F, Zhang Y, Wang J, Sun J, Yang E, et al. Biologically inspired progressive enhancement target detection from heavy cluttered SAR images[J]. Cogn Comput. 2016;8(5):955–66.
Article Google Scholar
Fu X, Huang J, Zeng D, Huang Y, Ding X, Paisley J. Removing rain from single images via a deep detail network. CVPR. 2017:3855–63.
Shufei Zhang et al. Learning from few samples with memory network, cognitive computation, 2018; 10(1) 15–22.
Luo C, et al. Zero-shot learning via attribute regression and class prototype rectification. IEEE Transactions on Image Processing. 2018;27(2):637–48.
Article Google Scholar
Liu MY, Breuel T, Kautz J. Unsupervised image-to-image translation networks. Advances in Neural Information Processing Systems. 2017:700–8.
Liao J, Yao Y, Yuan L, Hua G, Kang SB. Visual attribute transfer through deep image analogy. ACM Trans Graph. 2017;36(4):120.
Article Google Scholar
Choi Y, Choi M, Kim M, Ha JW, Kim S, Choo J. Stargan: unified generative adversarial networks for multi-domain image-to-image translation. arXiv preprint. 2017;arXiv:1711.09020.
Google Scholar
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. CVPR. 2017:2223–32.
Yi Z, Zhang H, Tan P, Gong M. Dualgan: unsupervised dual learning for image-to-image translation. CVPR. 2017:2849–57.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep-convolutional neural networks. NIPS. 2012:1097–105.
Zhao B, Feng J, Wu X, Yan S. A survey on deep learning-based fine-grained object classification and semantic segmentation. Int J Autom Comput. 2017;14(2):119–35.
Article Google Scholar
Yan Y, Ren J, Sun G, Zhao H, Han J, Li X, et al. Unsupervised image saliency detection with gestalt-laws guided optimization and visual attention based refinement. Pattern Recogn. 2018;79:65–78.
Article Google Scholar
Aboudib A, Gripon V, Coppin G. A biologically inspired framework for visual information processing and an application on modeling bottom-up visual attention. Cogn Comput. 2016;8(6):1007–26.
Article Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: NIPS. 2014:2672–80.
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint. 2015;arXiv:1511.06434.
Google Scholar
Zhu JY, Kr¨ahenb¨uhl P, Shechtman E, Efros AA. Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision. 2016:597–613.
Gao F, Huang T, Wang J, Sun J, Hussain A, Yang E. Dual-branch deep convolution neural network for polarimetric SAR image classification. Appl Sci. 2017;7(5):447.
Article Google Scholar
Gao F, Yang Y, Wang J, et al. A deep convolutional generative adversarial networks (DCGANs)-based semi-supervised method for object recognition in synthetic aperture radar (SAR) images. Remote Sens, 2018, 10(6).
Article Google Scholar
Reed, Scott and Akata, Zeynep and Yan, Xinchen and Logeswaran, Lajanugen and Schiele, Bernt and Lee, Honglak.: Generative adversarial text to image synthesis. In: ICML. 2016: 1060–1069.
Huang X, Liu MY, Belongie S, et al. Multimodal unsupervised image-to-image translation. arXiv preprint. 2018;arXiv:1804.04732.
Google Scholar
Zhu JY, Zhang R, Pathak D, Darrell T, Efros AA, Wang O, et al. Toward multimodal image-to-image translation. NIPS. 2017:465–76.
Briggs F, Mangun GR, Usrey WM. Attention enhances synaptic efficacy and the signal-to-noise ratio in neural circuits. Nature. 2013;499(7459):476–80.
Article CAS Google Scholar
Wang Z, Ren J, Zhang D, et al. A deep-learning based feature hybrid framework for spatiotemporal saliency detection inside videos. Neurocomputing. 2018;289:68–83.
Article Google Scholar
Ma S, Fu J, Chen CW, Mei T. DA-GAN: instance-level image translation by deep attention generative adversarial networks (with supplementary materials). CVPR. 2018:5657–66.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. CVPR. 2016:770–8.
Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H, et al. Residual attention network for image classification. CVPR. 2017:3156–64.
Liu X, Deng Z. Segmentation of drivable road using deep fully convolutional residual network with pyramid pooling. Cogn Comput. 2018;10(2):272–81.
Article Google Scholar
Fu J, Zheng H, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. CVPR. 2017:4438–46.
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, et al. Show, attend and tell: neural image caption generation with visual attention. ICML. 2015:2048–57.
Glorot X, Bordes A, Bengio Y. Deep sparse rectifier neural networks. ICAIS. 2011:315–23.
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L. ImageNet: a large-scale hierarchical image database. CVPR. 2009:248–55.
Ronneberger O, Fischer P, Brox T. U-net: convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015: 234–241.
Google Scholar
Yang P, Huang K, Liu CL. Geometry preserving multi-task metric learning. Mach Learn. 2013;92(1):133–75.
Article Google Scholar
Yang X, Huang K, Zhang R, et al. Learning latent features with infinite nonnegative binary matrix trifactorization. TETCI. 2018;99:1–14.
Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (No. 61876121, 61472267, 61728205, 61502329, 61672371), Primary Research & Development Plan of Jiangsu Province (No. BE2017663), Aeronautical Science Foundation (20151996016), and Jiangsu Key Disciplines of Thirteen Five-Year Plan (No. 20168765) and Suzhou Institute of Trade & Commerce Research Project(KY-ZRA1805).

Author information

Zihan Ye and Fan Lyu contributed equally to this work.

Authors and Affiliations

Suzhou University of Science and Technology, Suzhou, 215009, China
Zihan Ye, Yu Sun, Qiming Fu & Fuyuan Hu
Tianjin University, Tianjin, 300072, China
Fan Lyu
Suzhou Institute of Trade & Commerce, Suzhou, 215009, China
Linyan Li

Authors

Zihan Ye
View author publications
You can also search for this author inPubMed Google Scholar
Fan Lyu
View author publications
You can also search for this author inPubMed Google Scholar
Linyan Li
View author publications
You can also search for this author inPubMed Google Scholar
Yu Sun
View author publications
You can also search for this author inPubMed Google Scholar
Qiming Fu
View author publications
You can also search for this author inPubMed Google Scholar
Fuyuan Hu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding authors

Correspondence to Linyan Li or Fuyuan Hu.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Z., Lyu, F., Li, L. et al. Unsupervised Object Transfiguration with Attention. Cogn Comput 11, 869–878 (2019). https://doi.org/10.1007/s12559-019-09633-3

Download citation

Received: 22 July 2018
Accepted: 05 March 2019
Published: 08 April 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s12559-019-09633-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised Object Transfiguration with Attention

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DAU-GAN: Unsupervised Object Transfiguration via Deep Attention Unit

OSAGGAN: one-shot unsupervised image-to-image translation using attention-guided generative adversarial networks

AugGAN: Cross Domain Adaptation with GAN-Based Data Augmentation

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of Interest

Ethical Approval

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now