skip to main content
10.1145/3394171.3413656acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Cross-Granularity Learning for Multi-Domain Image-to-Image Translation

Authors Info & Claims
Published:12 October 2020Publication History

ABSTRACT

Image translation across diverse domains has attracted more and more attention. Existing multi-domain image-to-image translation algorithms only learn the features of the complete image without considering specific features of local instances. To ensure the important instance to be more realistically translated, we propose a cross-granularity learning model for multi-domain image-to-image translation. We provide detailed procedures to capture the features of instances during the learning process, and specifically learn the relationship between style of the global image and the style of an instance on the image through the enforcing of the cross-granularity consistency. In our design, we only need one generator to perform the instance-aware multi-domain image translation. Our extensive experiments on several multi-domain image-to-image translation datasets show that our proposed method can achieve superior performance compared with the state-of-the-art approaches.

Skip Supplemental Material Section

Supplemental Material

3394171.3413656.mp4

Existing multi-domain image-to-image translation algorithms only learn the features of the complete image without considering specific features of local instances. To ensure the important instance to be more realistically translated, we propose cross-granularity learning. To effectively capture the features of an important instance in the image to translate, we propose a novel cross-granularity consistency loss to guide the cross-granularity learning. As the style of the image has an impact on the object style, this loss is applied to regularize the training, so the relationship between the global style and object style can be effectively captured to ensure the translated image to be more realistic. The adversarial loss and cross-cycle consistency loss of the global image or instance images are applied during the training. Our extensive experiments show our method can achieve superior performance compared with the state-of-the-art approaches both qualitatively and quantitatively.

mp4

18.7 MB

References

  1. Martin Arjovsky, Soumith Chintala, and Léon Bottou. 2017. Wasserstein Generative Adversarial Networks. In International Conference on Machine Learning (ICML). 214--223.Google ScholarGoogle Scholar
  2. Bottou L Arjovsky M, Chintala S. 2017. Wasserstein GAN. arXiv preprint arXiv: 1701.07875.Google ScholarGoogle Scholar
  3. Ruichu Cai, Zijian Li, Pengfei Wei, Jie Qiao, Kun Zhang, and Zhifeng Hao. 2019. Learning Disentangled Semantic Representation for Domain Adaptation. International Joint Conference on Artificial Intelligence (2019), 2060--2066.Google ScholarGoogle Scholar
  4. Ying-Cong Chen, Xiaogang Xu, Zhuotao Tian, and Jiaya Jia. 2019. Homomorphic Latent Space Interpolation for Unpaired Image-To-Image Translation. In Conference on Computer Vision and Pattern Recognition (CVPR). 2408--2416.Google ScholarGoogle Scholar
  5. Yu-Sheng Chen, Yu-Ching Wang, Man-Hsin Kao, and Yung-Yu Chuang. 2018. Deep Photo Enhancer: Unpaired Learning for Image Enhancement From Photographs With GANs. In Conference on Computer Vision and Pattern Recognition (CVPR).Google ScholarGoogle Scholar
  6. Houthooft R SchulmanJ Sutskever I Abbeel P Chen X, Duan Y. 2016. InfoGAN: interpretable representation learning by information maximizing generative adversarial nets. In Proceedings of the Neural Information Processing Systems. 2172--2180.Google ScholarGoogle Scholar
  7. Yunjey Choi, Minje Choi, Munyoung Kim, Jung-Woo Ha, Sunghun Kim, and Jaegul Choo. 2017. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation. arXiv preprint arXiv:1711.09020 (2017), 8789--8797.Google ScholarGoogle Scholar
  8. David Eigen and Rob Fergus. 2015. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-scale Convolutional Architecture. In ICCV. 2650--2658.Google ScholarGoogle Scholar
  9. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.Google ScholarGoogle Scholar
  10. Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. 2017. Improved training of wasserstein gans. In Advances in neural information processing systems. 5767--5777.Google ScholarGoogle Scholar
  11. Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems (NeurIPS).Google ScholarGoogle Scholar
  12. Xun Huang, Ming-Yu Liu, Serge Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-image Translation. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle Scholar
  13. Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-Image Translation with Conditional Adversarial Networks. In CVPR. 1125--1134.Google ScholarGoogle Scholar
  14. Hinton G Krizhevsky A, Sutskever I. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of Advances in neural information processing systems.Google ScholarGoogle Scholar
  15. Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In The European Conference on Computer Vision (ECCV). 35--51.Google ScholarGoogle ScholarCross RefCross Ref
  16. Hsin-Ying Lee, Hung-Yu Tseng, Qi Mao, Jia-Bin Huang, Yu-Ding Lu, Maneesh Singh, and Ming-Hsuan Yang. 2019. DRIT: Diverse Image-to-Image Translation via Disentangled Representations. arXiv preprint arXiv:1905.01270 (2019).Google ScholarGoogle Scholar
  17. Jianan Li, Xiaodan Liang, Yunchao Wei, Tingfa Xu, Jiashi Feng, and Shuicheng Yan. 2017. Perceptual generative adversarial networks for small object detection. In Conference on Computer Vision and Pattern Recognition. 1222--1230.Google ScholarGoogle ScholarCross RefCross Ref
  18. Ming-Yu Liu, Thomas Breuel, and Jan Kautz. 2017. Unsupervised Image-to-Image Translation Networks. In NIPS. 700--708.Google ScholarGoogle Scholar
  19. Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, Nov (2008), 2579--2605.Google ScholarGoogle Scholar
  20. Kingma D P and Ba J. 2014. Modular generative adversarial networks. In Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.Google ScholarGoogle Scholar
  21. Chintala S et al Paszke A, Gross S. 2017. Automatic differentiation in pytorch. (2017).Google ScholarGoogle Scholar
  22. Alec Radford, Luke Metz, and Soumith Chintala. 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434 (2015).Google ScholarGoogle Scholar
  23. Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training gans. In Advances in neural information processing systems. 2234--2242.Google ScholarGoogle Scholar
  24. Zhiqiang Shen, Mingyang Huang, Jianping Shi, Xiangyang Xue, and Thomas Huang. 2019. Towards Instance-level Image-to-Image Translation. In Conference on Computer Vision and Pattern Recognition.Google ScholarGoogle Scholar
  25. Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the Inception Architecture for Computer Vision. In Conference on Computer Vision and Pattern Recognition (CVPR). 2818--2826.Google ScholarGoogle Scholar
  26. Xiaolong Wang and Abhinav Gupta. 2016. Generative Image Modeling Using Style and Structure Adversarial Networks. In ECCV. 318--335.Google ScholarGoogle Scholar
  27. Xuewen Yang, Dongliang Xie, and Xin Wang. 2018. Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation. 374--382.Google ScholarGoogle Scholar
  28. Fisher Yu, Wenqi Xian, Yingying Chen, Fangchen Liu, Mike Liao, Vashisht Madhavan, and Trevor Darrell. 2018. Bdd100k: A diverse driving video database with scalable annotation tooling. arXiv preprint arXiv:1805.04687 (2018).Google ScholarGoogle Scholar
  29. Efros A A et al Zhang R, Isola P. 2018. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 586--695.Google ScholarGoogle Scholar
  30. Junbo Zhao, Michael Mathieu, and Yann LeCun. 2016. Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126 (2016).Google ScholarGoogle Scholar
  31. Jun-Yan Zhu, Philipp Kr"ahenbühl, Eli Shechtman, and Alexei A Efros. 2016. Generative visual manipulation on the natural image manifold. In The European Conference on Computer Vision (ECCV).Google ScholarGoogle ScholarCross RefCross Ref
  32. Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017a. Unpaired image-to-image translation using cycle-consistent adversarial networks. In International Conference on Computer Vision (ICCV). 2223--2232.Google ScholarGoogle ScholarCross RefCross Ref
  33. Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017b. Toward Multimodal Image-To-Image Translation. In NIPS. 465--476.Google ScholarGoogle Scholar

Index Terms

  1. Cross-Granularity Learning for Multi-Domain Image-to-Image Translation

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      MM '20: Proceedings of the 28th ACM International Conference on Multimedia
      October 2020
      4889 pages
      ISBN:9781450379885
      DOI:10.1145/3394171

      Copyright © 2020 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 October 2020

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate995of4,171submissions,24%

      Upcoming Conference

      MM '24
      MM '24: The 32nd ACM International Conference on Multimedia
      October 28 - November 1, 2024
      Melbourne , VIC , Australia
    • Article Metrics

      • Downloads (Last 12 months)21
      • Downloads (Last 6 weeks)5

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader