research-article

CTCP: Cross Transformer and CNN for Pansharpening

Authors:

Changjie ChenAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 3003 - 3011

https://doi.org/10.1145/3581783.3613815

Published: 27 October 2023 Publication History

Abstract

Pansharpening is to fuse a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to obtain an enhanced LRMS image with high spectral and spatial resolution. The current Transformer-based pansharpening methods neglect the interaction between the extracted long- and short-range features, resulting in spectral and spatial distortion in the fusion results. To address this issue, a novel cross Transformer and convolutional neural network (CNN) for pansharpening (CTCP) is proposed to achieve better fusion results by designing a cross mechanism, which can enhance the interaction between long- and short-range features. First, a dual branch feature extraction module (DBFEM) is constructed to extract the features from the LRMS and PAN images, respectively, reducing the aliasing of the two image features. In the DBFEM, to improve the feature representation ability of the network, a cross long-short-range feature module (CLSFM) is designed by combining the feature learning capabilities of Transformer and CNN via the cross mechanism, which achieves the integration of long-short-range features. Then, to improve the ability of spectral feature representation, a spectral feature enhancement fusion module (SFEFM) based on a frequency channel attention is constructed to realize feature fusion. Finally, the shallow features from the PAN image are reused to provide detail features, which are integrated with the fused features to obtain the final pansharpened results. To the best of our knowledge, this is the first attempt to introduce the cross mechanism between Transformer and CNN in pansharpening field. Numerous experiments show that our CTCP outperforms some state-of-the-art (SOTA) approaches both subjectively and objectively. The source code will be released at https://github.com/zhsu99/CTCP.

References

[1]

Zhiyong Lv, Tongfei Liu, Jon Atli Benediktsson, and Nicola Falco. 2022. Land Cover Change Detection Techniques: Very-High-Resolution Optical Images: A Review. IEEE Geoscience and Remote Sensing Magazine 10 (2022), 44--63.

[2]

Gongjie Zhang, Shijian Lu, and Wei Zhang. 2019. CAD-Net: A Context-Aware Detection Network for Objects in Remote Sensing Imagery. IEEE Transactions on Geoscience and Remote Sensing 57 (2019), 10015--10024.

[3]

Guoqing Cui, Zhiyong Lv, Guangfei Li, Jon Atli Benediktsson, and Yudong Lu. 2018. Refining Land Cover Classification Maps Based on Dual-Adaptive Majority Voting Strategy for Very High Resolution Remote Sensing Images. Remote Sensing 10, 8, (2018).

[4]

Hangyuan Lu, Yong Yang, Shuying Huang, Wei Tu, and Weiguo Wan. 2022. A Unified Pansharpening Model Based on Band-Adaptive Gradient and Detail Correction. IEEE Transactions on Image Process 31 (2022), 918--933.

Digital Library

[5]

Giuseppe Masi, Davide Cozzolino, Luisa Verdoliva, and Giuseppe Scarpa. 2016.Pansharpening by Convolutional Neural Networks. Remote Sensing 8, 7 (2016).

[6]

Yancong Wei, Qiangqiang Yuan, Huanfeng Shen, and Liangpei Zhang. 2017. Boosting the Accuracy of Multispectral Image Pansharpening by Learning a Deep Residual Network. IEEE Geoscience and Remote Sensing Letters 14 (2017), 1795--1799.

[7]

Qiangqiang Yuan, Yancong Wei, Xiangchao Meng, Huanfeng Shen, and Liangpei Zhang. 2018. A Multiscale and Multidepth Convolutional Neural Network for Remote Sensing Imagery Pan-sharpening. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 11, 3 (2018), 978--989.

[8]

Liangjian Deng, Gemine Vivone, Cheng Jin, and Jocelyn Chanussot. 2021. Detail Injection-based Deep Convolutional Neural Networks for Pansharpening. IEEE Transactions on Geoscience and Remote Sensing 59 (2021), 6995--7010.

[9]

Furkan Ozcelik, Ugur Alganci, Elif Sertel, and Gozde Unal. 2021. Rethinking CNN based Pansharpening: Guided Colorization of Panchromatic Images Via GANs. IEEE Transactions on Geoscience and Remote Sensing 59 (2021), 3486--3501.

[10]

Wei Tu, Yong Yang, Shuying Huang, Weiguo Wan, Lixin Gan, and Hangyuan Lu. 2022. MMDN: Multi-scale and Multi-distillation Dilated Network for Pansharpening. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--14. Art no. 5410514.

[11]

Yong Yang, Wei Tu, Shuying Huang, Hangyuan Lu, Weiguo Wan, and Lixin Gan. 2022. Dual-stream Convolutional Neural Network with Residual Information Enhancement for Pansharpening. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--16. Art no. 5402416.

[12]

Yong Yang, Zhao Su, Shuying Huang, Weiguo Wan, Wei Tu, and Changjie Chen. 2022. DCNP: Dual-Information Compensation Network for Pansharpening. IEEE Geoscience and Remote Sensing Letters 19 (2022), 1--5. Art no. 5513005.

[13]

Man Zhou, Jie Huang, Yanchi Fang, Xueyang Fu, and Aiping Liu. 2022. Pan-Sharpening with Customized Transformer and Invertible Neural Network. In Proceedings of the Association for the Advancement of Artificial Intelligence (AAAI). 3553--3561.

[14]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia. Polosukhin. 2017. Attention is All You Need. In Proceedings of the Conference on Neural Information Processing Systems (NIPS). 5998--6008.

[15]

Jacob Devlin, Mingwei Chang, Kenton Lee, and Kristina Toutanova. 2019. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). 4171--4186.

[16]

Tom B. Brown et al. 2020. Language Models Are Few-Shot Learners. In Proceedings of the Conference on Neural Information Processing Systems (NIPS). 1877--1901.

[17]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby, 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR). 1--21.

[18]

Xiangchao Meng, Nan Wang, Feng Shao and Shutao Li. 2022. Vision Transformer for Pansharpening. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--11. Art no. 5409011.

[19]

Man Zhou, Xueyang Fu, Jie Huang, Feng Zhao, Aiping Liu, and Rujing Wang. 2022. Effective Pan-Sharpening With Transformer and Invertible Neural Network. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--15. Art no. 5406815.

[20]

Xunyang Su, Jinjiang Li and Zhen Hua. 2022.Transformer-based Regression Network for Pansharpening Remote Sensing Images. IEEE Transactions on Geoscience and Remote Sensing 60 (2022), 1--23. Art no. 5407423.

[21]

Sijia Li, Qing Guo, and An Li. 2022. Pan-Sharpening Based on CNN+ Pyramid Transformer by Using No-Reference Loss. Remote Sensing 14 (2022).

[22]

Wengang Zhu, Jinjiang Li, Zhiyong An, and Zhen Hua. 2023. Mutiscale Hybrid Attention Transformer for Remote Sensing Image Pansharpening. IEEE Transactions on Geoscience and Remote Sensing 61 (2023), 1--16. Art no. 5400416.

[23]

Max Ehrlich and Larry S Davis. 2019. Deep Residual Learning in the Jpeg Transform Domain. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3484--3493.

[24]

Kai Xu, Minghai Qin, Fei Sun, Yuhao Wang, Yen-Kuang Chen, and Fengbo Ren. 2020. Learning in the Frequency Domain. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR). 1740--1749.

[25]

Yunhe Wang, Chang Xu, Chao Xu, and Dacheng Tao. 2018. Packing Convolutional Neural Networks in the Frequency Domain. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 10 (2018), 2495--2510.

Digital Library

[26]

Zequn Qin, Pengyi Zhang, Fei Wu, and Xi Li, 2021. FcaNet: Frequency Channel Attention Networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 763--772.

[27]

Zhiliang Peng, Wei Huang, Shanzhi Gu, Lingxi Xie, Yaowei Wang, Jianbin Jiao, and Qixiang Ye. 2021. Conformer: Local Features Coupling Global Representations for Visual Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 357--366.

[28]

François Chollet. .2017. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR). 1800--1807.

[29]

Bruno Aiazzi, Stefano Baronti, and Massimo Selva. 2007. Improving Component Substitution Pansharpening Through Multivariate Regression of Ms + Pan Data. IEEE Transactions on Geoscience and Remote Sensing 45 (2007), 3230--3239.

[30]

Gemine Vivone, Rocco Restaino, and Jocelyn Chanussot. 2018. Full Scale Regression-based Injection Coefficients for Panchromatic Sharpening. IEEE Transactions on Image Process.27 (2018), 3418--3431.

[31]

Jaehyup Lee, Soomin Seo, and Munchurl Kim. 2021. Sipsa-Net: Shift-invariant Pan Sharpening with Moving Object Alignment for Satellite Imagery. In Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition (CVPR). 10161--10169.

[32]

Man Zhou, Jie Huang, Keyu Yan, Gang Yang, Aiping Liu, Chongyi Li, and Feng Zhao. 2022. Normalization-based Feature Selection and Restitution for Pan-sharpening. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 3365--3374.

Digital Library

[33]

Man Zhou, Jie Huang, Chongyi Li, Hu Yu, Keyu Yan, Naishan Zheng, and Feng Zhao. 2022. Adaptively Learning Low-high Frequency Information Integration for Pan-sharpening. In Proceedings of the 30th ACM International Conference on Multimedia (ACM MM). 3375--3384.

Digital Library

[34]

Jacob Cohen, .1960. A Coefficient of Agreement for Nominal Scales. Educational and Psychological Measurement. 20 (1960), 37--46.

Cited By

Song QLu HXu CLiu RWan WTu W(2025)Invertible Attention-Guided Adaptive Convolution and Dual-Domain Transformer for PansharpeningIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2025.353135318(5217-5231)Online publication date: 2025
https://doi.org/10.1109/JSTARS.2025.3531353

Index Terms

CTCP: Cross Transformer and CNN for Pansharpening
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Image and video acquisition
        Hyperspectral imaging

Recommendations

Multi-scale Spatial-Spectral Attention Guided Fusion Network for Pansharpening
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pansharpening is to fuse high-resolution panchromatic (PAN) images with low-resolution multispectral (LR-MS) images to generate high-resolution multispectral (HR-MS) images. Most of the deep learning-based pansharpening methods did not consider the ...
A sparse representation based pansharpening method
Abstract
Insufficient information captured by a single satellite sensor can hardly be fit real applications. Pansharpening is a hot topic in remote sensing region, which combines the spectral information of multispectral image and spatial details of ...
Highlights
- HF and LF dictionaries are constructed by the information of MS images.
- HFC of PAN image is extracted with the HF dictionary by patch.
- The dictionaries are constructed by randomly sampling based method.
High spectral quality pansharpening approach based on MTF-matched filter banks

Pansharpening consists in merging a low-resolution multispectral image (MS) with a high spatial resolution panchromatic image (PAN) to produce a high resolution pansharpened MS image. It consists in enhancing spatially the low-resolution MS image by ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
273
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)43

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Song QLu HXu CLiu RWan WTu W(2025)Invertible Attention-Guided Adaptive Convolution and Dual-Domain Transformer for PansharpeningIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2025.353135318(5217-5231)Online publication date: 2025
https://doi.org/10.1109/JSTARS.2025.3531353

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten