Elsevier

Neurocomputing

Volume 443, 5 July 2021, Pages 75-84
Neurocomputing

Brief papers
URCA-GAN: UpSample Residual Channel-wise Attention Generative Adversarial Network for image-to-image translation

https://doi.org/10.1016/j.neucom.2021.02.054Get rights and content

Abstract

Multimodal image-to-image translation is a challenging topic in computer vision. In image-to-image translation, an image is translated from a source domain to different target domains. For many translation tasks, the difference between the source image and the target image is only in the foreground. In this paper, we propose a novel deep-learning-based method for image-to-image translation. Our method, named URCA-GAN, is based on a generative adversarial network and it can generate images of higher quality and diversity than existing methods. We introduce Upsample Residual Channel-wise Attention Blocks (URCABs), based on ResNet and softmax channel-wise attention, to extract features associated with the foreground. The URCABs form a parallel architecture module named Upsample Residual Channel-wise Attention Module (URCAM) to merge features from the URCABs. URCAM is embedded after the decoder in the generator to regulate image generation. Experimental results and quantitative evaluations showed that our model has better performance than current state-of-the-art methods in both quality and diversity. Especially, the LPIPS, PSNR, and SSIM of URCA-GAN on CelebA dataset increase by 1.31%,1.66%, and 4.74% respectively, the PSNR and SSIM on RaFD dataset increase by 1.35% and 6.71% respectively. In addition, visualization of the features from URCABs demonstrates that our model puts emphasis on the foreground features.

Introduction

Image-to-image translation is the task to generate images in a target domain based on images from a source domain by using a mapping. Applications include image colorization [1], super-resolution image generation [2], [3], style transfer [4] and others. Researchers have proposed many deep learning methods for different image-to-image translation tasks; for example, changing people’s emotion, changing from summer scenery to winter scenery and keepingn the same objects in the scene. The results for image-to-image generation have greatly improved in recent years due to the introduction of Generative Adversarial Networks (GANs) [5] for this task. GAN-based method usually contains an encoder to map the source image into a common latent feature space through convolutions and a decoder to map those latent feature to target domain image through transport convolutions.

We note that for many image-to-image translation tasks, only parts of the image need to be transformed but not the entire image. For example, if we would like to change the emotion of a person from happy to angry, the changes should only be in the facial region. The hair color, hair style and clothing should stay the same. Researchers have proposed many image-to-image translation algorithms to achieve this. One popular method is the CycleGAN [6]. With enforcing the cycle consistency between the source domain and the target domain, the common content in the source image and the target image can be retained during the cross-domain translation. Other methods, such as DualGAN [7] and DiscoGAN [8], also utilize similar principle in their design to maintain the context in the image. To tackle translation among multiple domains, methods such as MUNIT [9] and StarGAN [10] have been proposed. Our method is based on the StarGAN network, which utilizes one generator and one discriminator to achieve multi-domain translation.

In the translation process, a good method should be able to focus on the image region where there is difference between the source image and the target image. This is analogous with the visual attention mechanism in the human visual system. Inspired by the human visual system and the successful applications of the attention mechanism in many computer vision algorithms, we propose an UpSample Residual Channel-wise Attention Generative Adversarial Network (URCA-GAN) for image-to-image translation. In the URCA-GAN system, we embed a novel module (UpSample Residual Channel-wise Attention Module, URCAM) after the decoder in the generator for feature filtering through a number of parallel UpSample Residual Channel-wise Attention Blocks (URCABs). URCAB is a neural network block based on the ResNet [11], it utilizes feature residual to control the feature contents in ResNet and extract features of interest for translation by softmax channel-wise attention. The embedded URCAM is jointly trained with the StraGAN network to emphasize features that are most important and discriminative in the channel feature maps.

Our contributions in this paper are as follows:

  • We propose URCAB, a novel residual block that utilizes the residual of feature and channel-wise attention to accomplish the improvement of feature filtering implementations, and combine different URCABs into URCAM with a parallel structure.

  • We embed the URCAM module into the generator of the StarGAN network and propose the URCA-GAN for image-to-image translation.

  • We have shown the different features that can be extracted by the URCABs in URCAM and analyzed the effect through the visualization of feature maps.

  • Our experimental results have demonstrated that our method could improve the quality and diversity of synthetic images.

Section snippets

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) [5] provide an effective way to generate images. A typical GAN model consists of two main modules: generator and discriminator. The discriminator is used to distinguish whether the input image is real or synthetic, and the generator would try their best to fool the discriminator and let the discriminator consider the synthetic images from generator as real images. With adversarial training between the generator and the discriminator, the ability of the

Our approach

In this section, we present the details of our proposed URCA-GAN. The goal of our method is to produce a mapping from the source domain to the target domain. In the mapping, we only need to modify the content of the foreground at locations where there are differences between the source image and the target image. In practice, the residuals in ResNet represent the changes in features during the training of the neural networks. Inspired by this, we decide to control the residual features in

Datasets

We chose the CelebFaces Attributes (CelebA) [46] and the Radboud Faces Database (RaFD) [47] as datasets. For the CelebA dataset, the training set contains 200,599 face images and the test set contains 2,000 images, randomly selected from the dataset. The initial size of the images are 178×218. They were cropped to a size of 178×178 and then resized to 128×128. We define seven domains in our experiments for multi-domain translation:black hair, blond hair, brown hair, male, female, young and old.

Conclusion

In this paper, we have proposed URCA-GAN, a novel deep learning network for the up-sample in image-to-image translation. In URCA-GAN, we embed a module named URCAM which is composed by a number of parallel URCABs, a modified Residual Block with softmax channel-wise attention. A single URCAB is utilized to extract and refine a single feature related to the foreground in the feature maps. The enhanced features from different parallel URCABs are merged to form the output of URCAM. The quantitative

CRediT authorship contribution statement

Xuan Nie: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing, Supervision. Haoxuan Ding: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft. Manhua Qi: Software, Validation, Investigation, Resources. Yifei Wang: Validation, Investigation, Resources. Edward K. Wong: Writing - review & editing, Visualization, Supervision.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Xuan Nie was born in 1976, He is an associate professor with School of Software, Northwestern Polytechnical University, Xi’an City, China. He received the B.S. degree, the M.S. degree and the Ph.D. in Automatic Control, Pattern Recognition and Computer Application Technology from Northwestern Polytechnical University of China, Xi’an City, China, in 1998, 2001, and 2005 respectively. He joined the School of Software and Microelectronics, Northwestern Polytechnical University, Xi’an, China, in

References (51)

  • Y. Cao, Z. Zhou, W. Zhang, Y. Yu, Unsupervised diverse colorization via generative adversarial networks, in: Machine...
  • C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A.P. Aitken, A. Tejani, J. Totz, Z. Wang, W....
  • B. Wu, H. Duan, Z. Liu, G. Sun, SRPGAN: perceptual generative adversarial network for single image super resolution,...
  • P. Isola, J. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, in: 2017 IEEE...
  • I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville, Y. Bengio, Generative...
  • J. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks,...
  • Z. Yi, H.R. Zhang, P. Tan, M. Gong, Dualgan: Unsupervised dual learning for image-to-image translation, in: IEEE...
  • T. Kim, M. Cha, H. Kim, J.K. Lee, J. Kim, Learning to discover cross-domain relations with generative adversarial...
  • X. Huang, M. Liu, S.J. Belongie, J. Kautz, Multimodal unsupervised image-to-image translation, in: Computer Vision -...
  • Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, J. Choo, Stargan: Unified generative adversarial networks for multi-domain...
  • K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: 2016 IEEE Conference on Computer...
  • A. Radford, L. Metz, S. Chintala, Unsupervised representation learning with deep convolutional generative adversarial...
  • M. Mirza, S. Osindero, Conditional generative adversarial nets, CoRR...
  • X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, P. Abbeel, Infogan: Interpretable representation learning by...
  • M. Arjovsky, S. Chintala, L. Bottou, Wasserstein GAN, CoRR...
  • I. Gulrajani et al.

    Improved training of wasserstein gans

  • T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, B. Catanzaro, High-resolution image synthesis and semantic manipulation with...
  • G. Perarnau, J. van de Weijer, B. Raducanu, J.M. Álvarez, Invertible conditional gans for image editing, CoRR...
  • M. Li, W. Zuo, D. Zhang, Deep identity-aware transfer of facial attributes, CoRR...
  • M. Liu, O. Tuzel, Coupled generative adversarial networks, in: Advances in Neural Information Processing Systems 29:...
  • M. Liu et al.

    Unsupervised image-to-image translation networks

  • D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations,...
  • O. Vinyals, A. Toshev, S. Bengio, D. Erhan, Show and tell: A neural image caption generator, in: IEEE Conference on...
  • K. Xu, J. Ba, R. Kiros, K. Cho, A.C. Courville, R. Salakhutdinov, R.S. Zemel, Y. Bengio, Show, attend and tell: Neural...
  • J. Lu, C. Xiong, D. Parikh, R. Socher, Knowing when to look: Adaptive attention via a visual sentinel for image...
  • Cited by (0)

    Xuan Nie was born in 1976, He is an associate professor with School of Software, Northwestern Polytechnical University, Xi’an City, China. He received the B.S. degree, the M.S. degree and the Ph.D. in Automatic Control, Pattern Recognition and Computer Application Technology from Northwestern Polytechnical University of China, Xi’an City, China, in 1998, 2001, and 2005 respectively. He joined the School of Software and Microelectronics, Northwestern Polytechnical University, Xi’an, China, in 2006, as a lecture and has been a Associate Professor of Since 2010. He was a visiting professor in Hong Kong Poly-technical University in 2010 and in University of Michigan, USA during 2013 respectively. His main research interest covers Machine Learning, Computer Vision, Image Processing, and their applications. He has authored and coauthored over 30 journal and conference papers, three monographs and co-invented patents. Dr. Nie was a reviewer for the IEEE Internet of Things Journal. He is a recipient of Science and Technology Achievement Award of Xi’an City 2015.

    Haoxuan Ding was born in Xi’an City, China in 1995. He received the B.S. degree in Flight Vehicle Propulsion Engineering from Northwestern Polytechnical University, China, in 2018. He has been pursuing the M.S. degree with Northwestern Polytechnical University, Xi’an, China, since 2018. His current research interests include Generative Adversarial Network, object detection and their industrial applications.

    Manhua Qi was born in Xi’an City, China in 1994. She received the M.S. degree in software engineering from Northwestern Polytechnical University, Xi’an, China, in 2020. She currently works in ZTE Corporation as a 5G Software Engineer.

    Yifei Wang was born in 1995. He received the B.S. degree in software engineering from Northwestern Polytechnical University, Xi’an, China, in 2017, where he is currently pursuing the M.S. degree. His current research interests include image super-resolution and object detection.

    Edward K. Wong received his B. E. degree from the State University of New York at Stony Brook, his Sc. M. degree from Brown University and his Ph. D. degree from Purdue University, all in Electrical Engineering. He is currently associate professor in the Department of Computer Science and Engineering at the NYU Tandon School of Engineering, Brooklyn, NY. His research interests lie in the areas of computer vision, multimedia computing, medical image processing, and digital forensics, and he has published extensively in these areas. He has worked on many research projects funded by federal and state agencies, as well as private industry. He has served as an associate editor for the journal Information Sciences and the International Journal of Multimedia Intelligence and Security, and is currently an associate editor for the journal Springer LNCS Transactions on Data Hiding and Multimedia Security. Dr. Wong has also served on the organizing committee and technical program committee of numerous IEEE, ACM, and other international conferences.

    This work was supported by the 2020 key research and development plan of Shaanxi Province under Project 2020ZDLSF04-02.

    View full text