skip to main content
10.1145/3581783.3612492acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Style Transfer Meets Super-Resolution: Advancing Unpaired Infrared-to-Visible Image Translation with Detail Enhancement

Published: 27 October 2023 Publication History

Abstract

The problem of unpaired infrared-to-visible image translation has gained significant attention due to its ability to generate visible images with color information from low-detail grayscale infrared inputs. However, current methodologies often depend on conventional style transfer techniques, which constrain the spatial resolution of the visible output to be equivalent to that of the input infrared image. The fixed generation pattern results in blurry generated results when translating low-resolution infrared inputs, and utilizing high-resolution infrared inputs as a solution necessitates greater computational resources. This spurs us to investigate the challenging unpaired image translation from low-resolution infrared inputs to high-resolution visible outputs, with the ultimate goal of enhancing image details while reducing computational costs. Therefore, we propose a unified framework that integrates the super-resolution process into our unpaired infrared-to-visible image transfer, yielding realistic and high-resolution results. Specifically, we propose the Detail Consistency Loss to establish a connection between the two aforementioned modules, thereby enhancing the quality of visual detail in style transfer results through the super-resolution module. Furthermore, our Texture Perceptual Loss is designed to ensure that the generator generates high-quality visual details accurately and reliably. Experimental results indicate that our method outperforms other comparative approaches when utilizing low-resolution infrared inputs. Remarkably, our approach even surpasses techniques that use high-resolution infrared inputs to generate visible images. Last but equally important, we propose a new and challenging dataset, dubbed as InfraredCity-HD, which comprises 512X512 resolution images, to advance research on high-resolution infrared-related fields.

References

[1]
Matthew Amodio and Smita Krishnaswamy. 2019. TraVeLGAN: Image-To-Image Translation by Transformation Vector Learning. In CVPR. 8983--8992.
[2]
Adrian Bulat, Jing Yang, and Georgios Tzimiropoulos. 2018. To Learn Image Super-Resolution, Use a GAN to Learn How to Do Image Degradation First. In ECCV. 187--202.
[3]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2014. Learning a Deep Convolutional Network for Image Super-Resolution. In ECCV. 184--199.
[4]
Chao Dong, Chen Change Loy, Kaiming He, and Xiaoou Tang. 2016b. Image Super-Resolution Using Deep Convolutional Networks. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 38, 2 (2016), 295--307.
[5]
Chao Dong, Chen Change Loy, and Xiaoou Tang. 2016a. Accelerating the Super-Resolution Convolutional Neural Network. In ECCV. 391--407.
[6]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.
[7]
FLIR. 2018. FREE FLIR Thermal Dataset for Algorithm Training. https://www.flir.com/oem/adas/adas-dataset-form/
[8]
Raj Kumar Gupta, Alex Yong Sang Chia, Deepu Rajan, Ee Sin Ng, and Zhiyong Huang. 2012. Image colorization using similar images. In ACM MM. 369--378.
[9]
Muhammad Haris, Gregory Shakhnarovich, and Norimichi Ukita. 2018. Deep Back-Projection Networks for Super-Resolution. In CVPR. 1664--1673.
[10]
M. A. Hogervorst and A. Toet. 2007. Fast and true-to-life application of daytime colours to night-time imagery. In ICIF. 1--8.
[11]
Jiaxing Huang, Dayan Guan, Aoran Xiao, and Shijian Lu. 2021. Cross-View Regularization for Domain Adaptive Panoptic Segmentation. In CVPR. 10133--10144.
[12]
Xun Huang, Ming-Yu Liu, Serge J. Belongie, and Jan Kautz. 2018. Multimodal Unsupervised Image-to-Image Translation. In ECCV. 179--196.
[13]
Yan Huang, Wei Wang, and Liang Wang. 2015. Bidirectional Recurrent Convolutional Networks for Multi-Frame Super-Resolution. In NIPS. 235--243.
[14]
Goodfellow Ian J., Pouget-Abadie Jean, Mirza Mehdi, Xu Bing, Warde-Farley David, Ozair Sherjil, Courville Aaron, and Bengio Yoshua. 2014. Generative Adversarial Networks. arXiv preprint arXiv:1406.2661 (2014).
[15]
Liming Jiang, Changxu Zhang, Mingyang Huang, Chunxiao Liu, Jianping Shi, and Chen Change Loy. 2020. TSIT: A Simple and Versatile Framework for Image-to-Image Translation. In ECCV. 206--222.
[16]
Zhu Jun-Yan, Park Taesung, Isola Phillip, and Efros Alexei A. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In ICCV. 2223--2232.
[17]
Chanyong Jung, Gihyun Kwon, and Jong Chul Ye. 2022. Exploring Patch-wise Semantic Relation for Contrastive Learning in Image-to-Image Translation Tasks. In CVPR. 18239--18248.
[18]
Jiwon Kim, Jung Kwon Lee, and Kyoung Mu Lee. 2016. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In CVPR. 1646--1654.
[19]
Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In ICML. 1857--1865.
[20]
Christian Ledig, Lucas Theis, Ferenc Huszar, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew P. Aitken, Alykhan Tejani, Johannes Totz, Zehan Wang, and Wenzhe Shi. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In CVPR. 105--114.
[21]
Hsin-Ying Lee, Hung-Yu Tseng, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. 2018. Diverse Image-to-Image Translation via Disentangled Representations. In ECCV. 36--52.
[22]
Shuang Li, Bingfeng Han, Zhenjie Yu, Chi Harold Liu, Kai Chen, and Shuigen Wang. 2021. I2V-GAN: Unpaired Infrared-to-Visible Video Translation. In ACM MM. 3061--3069.
[23]
Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. SwinIR: Image Restoration Using Swin Transformer. In ICCV Workshop. 1833--1844.
[24]
Yaoyuan Liang. 2021. Unsupervised Super Resolution Reconstruction of Traffic Surveillance Vehicle Images. In ICMLC. 336--341.
[25]
Ming-Yu Liu and Oncel Tuzel. 2016. Coupled generative adversarial networks. In NIPS. 469--477.
[26]
Heusel Martin, Ramsauer Hubert, Unterthiner Thomas, Nessler Bernhard, and Hochreiter Sepp. 2017. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In NIPS. 6626--6637.
[27]
Youssef Alami Mejjati, Christian Richardt, James Tompkin, Darren Cosker, and Kwang In Kim. 2018. Unsupervised Attention-guided Image-to-Image Translation. In NIPS. 3697--3707.
[28]
Mehdi Mirza and Simon Osindero. 2014. Conditional Generative Adversarial Nets. arXiv preprint arXiv:1411.1784 (2014).
[29]
Taesung Park, Alexei A. Efros, Richard Zhang, and Jun-Yan Zhu. 2020. Contrastive Learning for Unpaired Image-to-Image Translation. In ECCV. 319--345.
[30]
Isola Phillip, Zhu Jun-Yan, Zhou Tinghui, and Efros Alexei A. 2017. Image-to-image translation with conditional adversarial networks. In CVPR. 1125--1134.
[31]
Alec Radford, Luke Metz, and Soumith Chintala. 2016. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. In ICLR.
[32]
Joseph Redmon and Ali Farhadi. 2018. YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767 (2018).
[33]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved Techniques for Training GANs. In NIPS. 2226--2234.
[34]
Wenzhe Shi, Jose Caballero, Ferenc Huszar, Johannes Totz, Andrew P. Aitken, Rob Bishop, Daniel Rueckert, and Zehan Wang. 2016. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In CVPR. 1874--1883.
[35]
Assaf Shocher, Nadav Cohen, and Michal Irani. 2018. "Zero-Shot" Super-Resolution Using Deep Internal Learning. In CVPR. 3118--3126.
[36]
Patricia L. Suarez, Angel Domingo Sappa, and Boris Xavier Vintimilla. 2017. Infrared Image Colorization Based on a Triplet DCGAN Architecture. In CVPR. 212--217.
[37]
Alexander Toet. 2003. Natural colour mapping for multiband nightvision imagery. Information Fusion (2003).
[38]
Longguang Wang, Yingqian Wang, Xiaoyu Dong, Qingyu Xu, Jungang Yang, Wei An, and Yulan Guo. 2021a. Unsupervised Degradation Representation Learning for Blind Super-Resolution. In CVPR. 10581--10590.
[39]
Longguang Wang, Yingqian Wang, Xiaoyu Dong, Qingyu Xu, Jungang Yang, Wei An, and Yulan Guo. 2021b. Unsupervised Degradation Representation Learning for Blind Super-Resolution. In CVPR. 10581--10590.
[40]
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. 2018a. High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs. In CVPR. 8798--8807.
[41]
Wei Wang, Haochen Zhang, Zehuan Yuan, and Changhu Wang. 2021d. Unsupervised Real-World Super-Resolution: A Domain Adaptation Perspective. In ICCV. 4298--4307.
[42]
Xintao Wang, Liangbin Xie, Chao Dong, and Ying Shan. 2021c. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In ICCV Workshops. 1905--1914.
[43]
Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, and Chen Change Loy. 2018c. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In ECCV Workshops. 63--79.
[44]
Yifan Wang, Federico Perazzi, Brian McWilliams, Alexander Sorkine-Hornung, Olga Sorkine-Hornung, and Christopher Schroers. 2018b. A Fully Progressive Approach to Single-Image Super-Resolution. In CVPR Workshops. 864--873.
[45]
Binhui Xie, Longhui Yuan, Shuang Li, Chi Harold Liu, and Xinjing Cheng. 2022. Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation. In CVPR. 8058--8068.
[46]
Xuewen Yang, Dongliang Xie, and Xin Wang. 2018. Crossing-Domain Generative Adversarial Networks for Unsupervised Multi-Domain Image-to-Image Translation. In ACM MM. 374--382.
[47]
Zili Yi, Hao (Richard) Zhang, Ping Tan, and Minglun Gong. 2017. DualGAN: Unsupervised Dual Learning for Image-to-Image Translation. In ICCV. 2868--2876.
[48]
Zhenjie Yu, Kai Chen, Shuang Li, Bingfeng Han, Chi Harold Liu, and Shuigen Wang. 2022. ROMA: Cross-Domain Region Similarity Matching for Unpaired Nighttime Infrared to Daytime Visible Video Translation. In ACM MM. 5294--5302.
[49]
Choi Yunjey, Choi Minje, Kim Munyoung, Ha Jung-Woo, Kim Sunghun, and Choo Jaegul. 2018. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In CVPR. 8789--8797.
[50]
Han Zhang, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. 2018. Self-Attention Generative Adversarial Networks. arXiv preprint arXiv:1805.08318 (2018).
[51]
Kai Zhang, Jingyun Liang, Luc Van Gool, and Radu Timofte. 2021. Designing a Practical Degradation Model for Deep Blind Image Super-Resolution. In ICCV. 4771--4780.
[52]
Xingchen Zhang, Ping Ye, and Gang Xiao. 2020. VIFB: A Visible and Infrared Image Fusion Benchmark. In CVPR. 468--478.
[53]
Chuanxia Zheng, Tat-Jen Cham, and Jianfei Cai. 2021. The Spatially-Correlative Loss for Various Image Translation Tasks. In CVPR. 16407--16417.
[54]
Jun-Yan Zhu, Richard Zhang, Deepak Pathak, Trevor Darrell, Alexei A Efros, Oliver Wang, and Eli Shechtman. 2017. Toward multimodal image-to-image translation. In NIPS. 465--476.

Cited By

View all
  • (2024)MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image TranslationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681375(10745-10754)Online publication date: 28-Oct-2024

Index Terms

  1. Style Transfer Meets Super-Resolution: Advancing Unpaired Infrared-to-Visible Image Translation with Detail Enhancement

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. infrared-to-visible translation
    2. style transfer
    3. super-resolution

    Qualifiers

    • Research-article

    Funding Sources

    • National Key R&D Program of China
    • National Natural Science Foundation of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)181
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)MappingFormer: Learning Cross-modal Feature Mapping for Visible-to-infrared Image TranslationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681375(10745-10754)Online publication date: 28-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media