Abstract:
The key to hyperspectral image (HSI) and multispectral image (MSI) fusion is to take advantage of the properties of interspectra self-similarities of HSIs and spatial cor...Show MoreMetadata
Abstract:
The key to hyperspectral image (HSI) and multispectral image (MSI) fusion is to take advantage of the properties of interspectra self-similarities of HSIs and spatial correlations of MSIs. However, leading convolutional neural network (CNN)-based methods show shortcomings in capturing long-range dependencies and self-similarity prior. To this end, we propose a simple yet efficient Transformer-based network, hyperspectral and multispectral image fusion (HMF)-Former, for the HSI/MSI fusion. The HMF-Former adopts a U-shaped architecture with a spatio-spectral Transformer block (SSTB) as the basic unit. In the SSTB, embedded spatial-wise multihead self-attention (Spa-MSA) and spectral-wise multihead self-attention (Spe-MSA) effectively capture interactions of spatial regions and interspectra dependencies, respectively. They are consistent with the properties of spatial correlations of MSIs and interspectra self-similarities of HSIs. In addition, specially designed SSTB enables the HMF-Former to capture both local and global features while maintaining linear complexity. Extensive experiments on four benchmark datasets show that our method significantly outperforms state-of-the-art methods.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 20)