ABSTRACT
Image translation across diverse domains has attracted more and more attention. Existing multi-domain image-to-image translation algorithms only learn the features of the complete image without considering specific features of local instances. To ensure the important instance to be more realistically translated, we propose a cross-granularity learning model for multi-domain image-to-image translation. We provide detailed procedures to capture the features of instances during the learning process, and specifically learn the relationship between style of the global image and the style of an instance on the image through the enforcing of the cross-granularity consistency. In our design, we only need one generator to perform the instance-aware multi-domain image translation. Our extensive experiments on several multi-domain image-to-image translation datasets show that our proposed method can achieve superior performance compared with the state-of-the-art approaches.
Supplemental Material
- Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In International Conference on Machine Learning (ICML). 214--223.Google Scholar
- Bottou L Arjovsky M, Chintala S. 2017. Wasserstein GAN. arXiv preprint arXiv: 1701.07875.Google Scholar
- Ruichu Cai, Zijian Li, Pengfei Wei, Jie Qiao, Kun Zhang, and Zhifeng Hao. 2019. Learning Disentangled Semantic Representation for Domain Adaptation. International Joint Conference on Artificial Intelligence (2019), 2060--2066.Google Scholar
- Ying-Cong Chen, Xiaogang Xu, Zhuotao Tian, and Jiaya Jia. 2019. Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation. In Conference on Computer Vision and Pattern Recognition (CVPR). 2408--2416.Google Scholar
- Yu-Sheng Chen, Yu-Ching Wang, Man-Hsin Kao, and Yung-Yu Chuang. 2018. Deep Photo Enhancer: Unpaired Learning for Image Enhancement From Photographs With GANs. In Conference on Computer Vision and Pattern Recognition (CVPR).Google Scholar
- Houthooft R SchulmanJ Sutskever I Abbeel P Chen X, Duan Y. 2016. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Neural Information Processing Systems. 2172--2180.Google Scholar
- Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017), 8789--8797.Google Scholar
- David Eigen and Rob Fergus. 2015. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture. In ICCV. 2650--2658.Google Scholar
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.Google Scholar
- Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767--5777.Google Scholar
- Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS).Google Scholar
- Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-image Translation. In The European Conference on Computer Vision (ECCV).Google Scholar
- Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. 1125--1134.Google Scholar
- Hinton G Krizhevsky A, Sutskever I. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in neural information processing systems.Google Scholar
- Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In The European Conference on Computer Vision (ECCV). 35--51.Google ScholarCross Ref
- Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2019. DRIT: Diverse Image-to-Image Translation via Disentangled Representations. arXiv preprint arXiv:1905.01270 (2019).Google Scholar
- Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. In Conference on Computer Vision and Pattern Recognition. 1222--1230.Google ScholarCross Ref
- Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In NIPS. 700--708.Google Scholar
- Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.Google Scholar
- Kingma D P and Ba J. 2014. Modular generative adversarial networks. In Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google Scholar
- Chintala S et al Paszke A, Gross S. 2017. Automatic differentiation in pytorch. (2017).Google Scholar
- Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google Scholar
- Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural information processing systems. 2234--2242.Google Scholar
- Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, and Thomas Huang. 2019. Towards Instance-level Image-to-Image Translation. In Conference on Computer Vision and Pattern Recognition.Google Scholar
- Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Conference on Computer Vision and Pattern Recognition (CVPR). 2818--2826.Google Scholar
- Xiaolong Wang and Abhinav Gupta. 2016. Generative Image Modeling Using Style and Structure Adversarial Networks. In ECCV. 318--335.Google Scholar
- Xuewen Yang, Dongliang Xie, and Xin Wang. 2018. Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation. 374--382.Google Scholar
- Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, and Trevor Darrell. 2018. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018).Google Scholar
- Efros A A et al Zhang R, Isola P. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586--695.Google Scholar
- Junbo Zhao, Michael Mathieu, and Yann LeCun. 2016. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).Google Scholar
- Jun-Yan Zhu, Philipp Kr"ahenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In The European Conference on Computer Vision (ECCV).Google ScholarCross Ref
- Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision (ICCV). 2223--2232.Google ScholarCross Ref
- Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-To-Image Translation. In NIPS. 465--476.Google Scholar
Index Terms
- Cross-Granularity Learning for Multi-Domain Image-to-Image Translation
Recommendations
Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation
MM '18: Proceedings of the 26th ACM international conference on MultimediaState-of-the-art techniques in Generative Adversarial Networks (GANs) have shown remarkable success in image-to-image translation from peer domain X to domain Y using paired image data. However, obtaining abundant paired data is a non-trivial and ...
Retrieval Guided Unsupervised Multi-domain Image to Image Translation
MM '20: Proceedings of the 28th ACM International Conference on MultimediaImage to image translation aims to learn a mapping that transforms an image from one visual domain to another. Recent works assume that images descriptors can be disentangled into a domain-invariant content representation and a domain-specific style ...
Multi-Domain Image-to-Image Translation with Cross-Granularity Contrastive Learning
The objective of multi-domain image-to-image translation is to learn the mapping from a source domain to a target domain in multiple image domains while preserving the content representation of the source domain. Despite the importance and recent efforts, ...
Comments