Brief papersURCA-GAN: UpSample Residual Channel-wise Attention Generative Adversarial Network for image-to-image translation☆
Introduction
Image-to-image translation is the task to generate images in a target domain based on images from a source domain by using a mapping. Applications include image colorization [1], super-resolution image generation [2], [3], style transfer [4] and others. Researchers have proposed many deep learning methods for different image-to-image translation tasks; for example, changing people’s emotion, changing from summer scenery to winter scenery and keepingn the same objects in the scene. The results for image-to-image generation have greatly improved in recent years due to the introduction of Generative Adversarial Networks (GANs) [5] for this task. GAN-based method usually contains an encoder to map the source image into a common latent feature space through convolutions and a decoder to map those latent feature to target domain image through transport convolutions.
We note that for many image-to-image translation tasks, only parts of the image need to be transformed but not the entire image. For example, if we would like to change the emotion of a person from happy to angry, the changes should only be in the facial region. The hair color, hair style and clothing should stay the same. Researchers have proposed many image-to-image translation algorithms to achieve this. One popular method is the CycleGAN [6]. With enforcing the cycle consistency between the source domain and the target domain, the common content in the source image and the target image can be retained during the cross-domain translation. Other methods, such as DualGAN [7] and DiscoGAN [8], also utilize similar principle in their design to maintain the context in the image. To tackle translation among multiple domains, methods such as MUNIT [9] and StarGAN [10] have been proposed. Our method is based on the StarGAN network, which utilizes one generator and one discriminator to achieve multi-domain translation.
In the translation process, a good method should be able to focus on the image region where there is difference between the source image and the target image. This is analogous with the visual attention mechanism in the human visual system. Inspired by the human visual system and the successful applications of the attention mechanism in many computer vision algorithms, we propose an UpSample Residual Channel-wise Attention Generative Adversarial Network (URCA-GAN) for image-to-image translation. In the URCA-GAN system, we embed a novel module (UpSample Residual Channel-wise Attention Module, URCAM) after the decoder in the generator for feature filtering through a number of parallel UpSample Residual Channel-wise Attention Blocks (URCABs). URCAB is a neural network block based on the ResNet [11], it utilizes feature residual to control the feature contents in ResNet and extract features of interest for translation by softmax channel-wise attention. The embedded URCAM is jointly trained with the StraGAN network to emphasize features that are most important and discriminative in the channel feature maps.
Our contributions in this paper are as follows:
- •
We propose URCAB, a novel residual block that utilizes the residual of feature and channel-wise attention to accomplish the improvement of feature filtering implementations, and combine different URCABs into URCAM with a parallel structure.
- •
We embed the URCAM module into the generator of the StarGAN network and propose the URCA-GAN for image-to-image translation.
- •
We have shown the different features that can be extracted by the URCABs in URCAM and analyzed the effect through the visualization of feature maps.
- •
Our experimental results have demonstrated that our method could improve the quality and diversity of synthetic images.
Section snippets
Generative Adversarial Networks (GANs)
Generative Adversarial Networks (GANs) [5] provide an effective way to generate images. A typical GAN model consists of two main modules: generator and discriminator. The discriminator is used to distinguish whether the input image is real or synthetic, and the generator would try their best to fool the discriminator and let the discriminator consider the synthetic images from generator as real images. With adversarial training between the generator and the discriminator, the ability of the
Our approach
In this section, we present the details of our proposed URCA-GAN. The goal of our method is to produce a mapping from the source domain to the target domain. In the mapping, we only need to modify the content of the foreground at locations where there are differences between the source image and the target image. In practice, the residuals in ResNet represent the changes in features during the training of the neural networks. Inspired by this, we decide to control the residual features in
Datasets
We chose the CelebFaces Attributes (CelebA) [46] and the Radboud Faces Database (RaFD) [47] as datasets. For the CelebA dataset, the training set contains face images and the test set contains images, randomly selected from the dataset. The initial size of the images are . They were cropped to a size of and then resized to . We define seven domains in our experiments for multi-domain translation:black hair, blond hair, brown hair, male, female, young and old.
Conclusion
In this paper, we have proposed URCA-GAN, a novel deep learning network for the up-sample in image-to-image translation. In URCA-GAN, we embed a module named URCAM which is composed by a number of parallel URCABs, a modified Residual Block with softmax channel-wise attention. A single URCAB is utilized to extract and refine a single feature related to the foreground in the feature maps. The enhanced features from different parallel URCABs are merged to form the output of URCAM. The quantitative
CRediT authorship contribution statement
Xuan Nie: Conceptualization, Methodology, Software, Writing - original draft, Writing - review & editing, Supervision. Haoxuan Ding: Conceptualization, Methodology, Software, Validation, Formal analysis, Writing - original draft. Manhua Qi: Software, Validation, Investigation, Resources. Yifei Wang: Validation, Investigation, Resources. Edward K. Wong: Writing - review & editing, Visualization, Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Xuan Nie was born in 1976, He is an associate professor with School of Software, Northwestern Polytechnical University, Xi’an City, China. He received the B.S. degree, the M.S. degree and the Ph.D. in Automatic Control, Pattern Recognition and Computer Application Technology from Northwestern Polytechnical University of China, Xi’an City, China, in 1998, 2001, and 2005 respectively. He joined the School of Software and Microelectronics, Northwestern Polytechnical University, Xi’an, China, in
References (51)
- Y. Cao, Z. Zhou, W. Zhang, Y. Yu, Unsupervised diverse colorization via generative adversarial networks, in: Machine...
- C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A.P. Aitken, A. Tejani, J. Totz, Z. Wang, W....
- B. Wu, H. Duan, Z. Liu, G. Sun, SRPGAN: perceptual generative adversarial network for single image super resolution,...
- P. Isola, J. Zhu, T. Zhou, A.A. Efros, Image-to-image translation with conditional adversarial networks, in: 2017 IEEE...
- I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A.C. Courville, Y. Bengio, Generative...
- J. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation using cycle-consistent adversarial networks,...
- Z. Yi, H.R. Zhang, P. Tan, M. Gong, Dualgan: Unsupervised dual learning for image-to-image translation, in: IEEE...
- T. Kim, M. Cha, H. Kim, J.K. Lee, J. Kim, Learning to discover cross-domain relations with generative adversarial...
- X. Huang, M. Liu, S.J. Belongie, J. Kautz, Multimodal unsupervised image-to-image translation, in: Computer Vision -...
- Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, J. Choo, Stargan: Unified generative adversarial networks for multi-domain...
Improved training of wasserstein gans
Unsupervised image-to-image translation networks
Cited by (0)
Xuan Nie was born in 1976, He is an associate professor with School of Software, Northwestern Polytechnical University, Xi’an City, China. He received the B.S. degree, the M.S. degree and the Ph.D. in Automatic Control, Pattern Recognition and Computer Application Technology from Northwestern Polytechnical University of China, Xi’an City, China, in 1998, 2001, and 2005 respectively. He joined the School of Software and Microelectronics, Northwestern Polytechnical University, Xi’an, China, in 2006, as a lecture and has been a Associate Professor of Since 2010. He was a visiting professor in Hong Kong Poly-technical University in 2010 and in University of Michigan, USA during 2013 respectively. His main research interest covers Machine Learning, Computer Vision, Image Processing, and their applications. He has authored and coauthored over 30 journal and conference papers, three monographs and co-invented patents. Dr. Nie was a reviewer for the IEEE Internet of Things Journal. He is a recipient of Science and Technology Achievement Award of Xi’an City 2015.
Haoxuan Ding was born in Xi’an City, China in 1995. He received the B.S. degree in Flight Vehicle Propulsion Engineering from Northwestern Polytechnical University, China, in 2018. He has been pursuing the M.S. degree with Northwestern Polytechnical University, Xi’an, China, since 2018. His current research interests include Generative Adversarial Network, object detection and their industrial applications.
Manhua Qi was born in Xi’an City, China in 1994. She received the M.S. degree in software engineering from Northwestern Polytechnical University, Xi’an, China, in 2020. She currently works in ZTE Corporation as a 5G Software Engineer.
Yifei Wang was born in 1995. He received the B.S. degree in software engineering from Northwestern Polytechnical University, Xi’an, China, in 2017, where he is currently pursuing the M.S. degree. His current research interests include image super-resolution and object detection.
Edward K. Wong received his B. E. degree from the State University of New York at Stony Brook, his Sc. M. degree from Brown University and his Ph. D. degree from Purdue University, all in Electrical Engineering. He is currently associate professor in the Department of Computer Science and Engineering at the NYU Tandon School of Engineering, Brooklyn, NY. His research interests lie in the areas of computer vision, multimedia computing, medical image processing, and digital forensics, and he has published extensively in these areas. He has worked on many research projects funded by federal and state agencies, as well as private industry. He has served as an associate editor for the journal Information Sciences and the International Journal of Multimedia Intelligence and Security, and is currently an associate editor for the journal Springer LNCS Transactions on Data Hiding and Multimedia Security. Dr. Wong has also served on the organizing committee and technical program committee of numerous IEEE, ACM, and other international conferences.
- ☆
This work was supported by the 2020 key research and development plan of Shaanxi Province under Project 2020ZDLSF04-02.