Elsevier

Signal Processing

Volume 177, December 2020, 107737
Signal Processing

RAFnet: Recurrent attention fusion network of hyperspectral and multispectral images

https://doi.org/10.1016/j.sigpro.2020.107737Get rights and content

Highlights

  • A recurrent attention fusion network (RAFnet) under a variational probabilistic framework is proposed for HS-MS fusion in an unsupervised manner.

  • Two autoencoders with shared generative model are designed to explore the spectral characteristic with a spectral extractor and spatial features with a spatial extractor.

  • A hierarchical RNN is designed to extract the abundant spectral characteristics of hyperspectral image.

  • A self-attention mechanism a relation-attention mechanism is utilized to model long dependencies of spectra sequences, which applied in conjunction with RNN.

Abstract

Hyperspectral imaging can facilitate a better understanding of more knowledge under real scenes, compared with traditional image systems. However, due to hardware limitations, only low resolution hyperspectral (LrHs) and high resolution multispectral (HrMs) images can generally be acquired. This paper proposed a recurrent attention fusion network (RAFnet) under a variational probabilistic generative framework, in order to fuse the LrHs and HrMs images together to generate a high resolution hyperspectral (HrHs) image in an unsupervised manner. In specific, two variational autoencoders are designed to preserve both spectral and spatial information of LrHs and HrMs images, coupled through a shared decoder to generate hyperspectral images. Considering the spectra of each hyperspectral pixel is intrinsically a sequence based data structure, we construct a hierarchical recurrent neural network to extract the abundant spectral information. Moreover, self-attention and relation-attention mechanisms are adopted to capture long temporal dependencies through the spectral domain. The effectiveness and efficiency are evaluated based on several publicly available hyperspectral datasets, compared with many state-of-the-art methods for the unsupervised fusion task.

Introduction

Hyperspectral images consist of continuous narrow spectral bands of a real scene captured by hyperspectral remote sensors, which can facilitate a fine delivery of more detailed information of distinct materials [1], [2], such as minerals in rocks, vegetation, synthetic materials and water. Hyperspectral image (HSI) analysis has become a thriving and active research area in computer vision with a wide range of applications [3], including object recognition and classification [4], [5], tracking [6], environmental monitoring [7], and change detection [8]. Usually, for high spatial resolution, hyperspectral sensor must have a small instantaneous field of view (IFOV) which reduces the signal to noise (SNR) ratio of the images. To improve the SNR, one has to widen the bandwidth for more light to enter, which reduces the spectral resolution [9]. Therefore, due to the hardware limitations of imaging sensor, there always exists an intrinsic tradeoff between spatial and spectral resolution in the images. Hyperspectral images collect hundreds of contiguous bands which provide finer spectral details of different materials, but often suffer from significantly low spatial resolution [10], [11]. On the contrary, although the multispectral images have high spatial resolution, their spectral resolution is relatively low. Images with both high spectral and spatial resolution are highly desirable [12] for better recognition and analysis. A natural way to generate such high resolution hyperspectral (HrHs) images is to fuse low resolution hyperspectral (LrHs) images with high resolution multispectral (HrMs) images, often referred as HS-MS fusion [13]. Hereafter, the resolution we mentioned denotes the spatial resolution particularly.

Many HS-MS image fusion algorithms have been proposed in the last decades [14], [15], [16], [17], [18], [19], [20], [21]. HrHs images can be reconstructed by combining endmembers of LrHs images and abundances of HrMs images. Addressing the HS-MS fusion task based on a linear spectral mixture model has drawn considerable attention due to its sound physical description [22], [23]. Popular approaches demonstrate the fusion procedure through linear factorization by the aid of different prior knowledge and regularizer such as sparse constraint [11], [22], [23], [24], [25]. Notwithstanding the good performances achieved, those models are restricted by the assumption of spectral mixture process, which is actually much more intricate and complex than a linear process [26].

Recently, deep learning based fusion methods have attracted considerable research interests and resulted in promising performance owing to its high non-linearity and good representation ability. Hence, it is appropriate for modeling the complex nonlinear relationship between LrHs and HrMs images in both spatial and spectral domains [27]. Among the typical deep learning models, convolutional neural network (CNN) based models draw much attention due to its great success in image processing, thus CNNs are used to extract data characteristics of LrHs and HrMs images [18], [28], [29]. In addition, deep learning based methods are data-driven and can reconstruct HrHs images through a feedforward propagation really fast during inference time. However, the current CNN based HS-MS fusion methods still have evident drawbacks as they usually use general frameworks for image processing, which lack specific interpretability for the HS-MS fusion task [13]. As expound in [5], the convolutional neural networks neglect the sequence-based data structure of hyperspectral images, leading to information loss. Thus, the abundant spectral information of hyperspectral image need to be further explored by a sequential model to improve the HS-MS fusion, since an HrHS image collected from real scenes always has hundreds of bands in the spectral domain. The popular sequential models including Recurrent neural network (RNNs), especially long short-term memory (LSTM) [30] and gated recurrent neural network (GRU) [31], have been firmly established as efficient approaches in sequence modeling. Another species of popular sequential models are based on transformers [32], [33], [34], whose direct connections between long-distance pairs are baked in attention mechanisms and enable the learning of long-term dependency [32]. To process the long contextual information of text, hierarchical recurrent neural networks are employed for sequential modeling [35], [36]. Inspired by those sequential models, we apply the hierarchical RNNs to model spectra sequences and draw global dependencies by the attention mechanism similar to transformers.

In addition, most of the deep learning based approaches are supervised models [18], [27], [28], [29], requiring the availability of the target HrHs image for training, which is not realistic in reality. In uSDN [37], an unsupervised deep learning network was first proposed to solve the HS-MS fusion problem, which used two autoencoders to extract the spectral basis from LrHs and spatial representations from HrMs, and reconstructed target HrHs through a shared decoder. Nonetheless, the uSDN optimized two autoencoders separately, and may not make full use of the interactions between the LrHs and HrMs images during the fusion. In addition, with fully-connected network, the uSDN ignored the spatial correlation in the spatial domain and sequential spectral structures of hyperspectral images, leading to the lack of strong representation capability.

As we know, deep probabilistic generative models are skilled in revealing the underlying data distribution and modeling diverse knowledge naturally, which have shown excellent unsupervised data expressive ability, such as deep belief networks [38], [39], variational autoencoders [40]. From a probabilistic perspective, the models are encouraged to capture the inside characteristics and facilitate the generation of long sequence with more information [35], [36], with stochastic variations injected at latent space. Inspired by this, we propose a novel variational probabilistic recurrent attention fusion network for unsupervised HS-MS fusion in this paper, called RAFnet. We reveal the underlying spectrum representations of LrHs with a spectral extractor, and explore the corresponding neighborhood in HrMs with a spatial extractor. In order to fully utilize the information of LrHs and HrMs images, the spectral and spatial features extracted from two extractors are fused together and then fed into a probabilistic generative model to reconstruct the target HrHs image.The main contributions of this work can be summarized as follows:

  • (1)

    We present a hierarchical recurrent attention neural network for HS-MS image fusion, which can effectively exploit the abundant spectral characteristics of hyperspectral image. To the best of our knowledge, it’s novel to utilize a sequential model to extract the underlying spectral information for the HS-MS fusion task.

  • (2)

    We design an architecture composed of two recurrent variational autoencoders for representation learning of LrHs and HrMs images in an unsupervised manner, where both the spatial and spectral characteristics are fused together in the underlying latent space for reconstructing the HrHs image.

  • (3)

    Beyond the hierarchical recurrent mechanism, a self-attention mechanism and a relation-attention mechanism are applied to model long dependencies regardless of the distance between spectra, which are used in conjunction with a recurrent neural network.

  • (4)

    With principled probabilistic modeling, the variational RAFnet is optimized jointly by maximizing the lower-bound of variational framework, leading to an efficient inference scalable to large scene.

This paper is organized as follows. Section 2 describes the related algorithms for the HS-MS fusion and elementary knowledge of sequential models. Section 3 formulates the observation models. Section 4 presents the proposed RAFnet. Experimental results and discussions are presented in Section 5, and the conclusion is given in Section 6.

Section snippets

Traditional methods

Several HSI-MSI fusion algorithms have been proposed in the last decades [14], [16], [17], [19], [20], [21]. Utilizing spectral unmixing in HS-MS fusion has been attracting considerable attention due to its straight-forward interpretation of the fusion process. Unmixing-based fusion methods aim at obtaining endmembers of LrHs and abundances from the HrMs image, respectively, under the constraints of relative sensor characteristics. Then the fused HrHs image can be reconstructed as the product

Problem formulation

Let Xl¯Rh×w×S denote the acquired LrHs image, with h and w as its height and width in the spatial dimension, and S as its band number in the spectral dimension. Xm¯RH×W×s denotes the available HrMs image of the same scene, with H and W as its height and width in the spatial dimension, and s represents the band number. In general, HrMs image has much higher spatial resolution than LrHs image, i.e., H > h, W > w, and LrHs image has much higher spectral resolution than HrMs image, i.e., S > s.

Proposed RAFnet

The overall simplified architecture of RAFnet is shown in Fig. 1. The whole architecture can be recognized as two variational probabilistic autoencoders for representation learning of LrHs and HrMs images, respectively. It is composed of three parts: LrHs encoder, HrMs encoder (both are inference models) and the shared decoder (generative model), and here we sketch three components briefly. Firstly, the LrHs encoder extract the latent spectral representation Zl of LrHs image Xl through a

Datasets and experimental setup

(1) Indian Pines: The hyperspectral data was taken over Indian Pine by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) sensor in 1996, with 224 spectral bands in the 0.4-2.5 µm region [1]. The original image covers 512 × 614 pixels with 20 m spatial resolution, and we select a 145 × 145-pixel-size image as the reference image. Following [22], the HrMs data were produced with uniform spectral response functions corresponding to Landsat TM bands 1–5 and 7, which cover the 450–520,

Conclusion

In this paper, we have provided a novel recurrent attention fusion network for the task of unsupervised HS-MS fusion in an end-to-end fashion. We apply a variational hierarchical recurrent network in the spectral extractor to model the intrinsically latent spaces, taking each pixel of hyperspectral images as a sequential data. At the same time, a spatial feature extractor composed of three convolutional layers is utilized to explore the spatial correlations in HrMs images. Further more, the

CRediT authorship contribution statement

Ruiying Lu: Conceptualization, Methodology, Software, Writing - original draft. Bo Chen: Supervision, Project administration, Funding acquisition, Resources. Ziheng Cheng: Investigation, Validation, Data curation. Penghui Wang: Supervision, Funding acquisition, Writing - review & editing.

Declaration of Competing Interest

We declare that the named authors have no conflict of interest, financial or otherwise in connection with the work submitted.

Acknowledgment

Bo Chen is partially supported by the 111 Project (No. B18039), NSFC (61771361) and Shaanxi Innovation Team Project; Penghui Wang is supported in part by NSFC (61701379).

References (62)

  • N. Akhtar et al.

    Sparse spatio-spectral representation for hyperspectral image super-resolution

    Proceedings of the European Conference on Computer Vision

    (2014)
  • N. Yokoya et al.

    Hyperspectral and multispectral data fusion: a comparative review of the recent literature

    IEEE Geosci. Remote Sens. Mag.

    (2017)
  • X. Cao et al.

    Hyperspectral image classification with Markov random fields and a convolutional neural network

    IEEE Trans. Image Process

    (2018)
  • A. Chakrabarti et al.

    Statistics of real-world hyperspectral images

    Proceedings of the Computer Vision and Pattern Recognition (CVPR)

    (2011)
  • M. Fauvel et al.

    Advances in spectral-spatial classification of hyperspectral images

    Proc. IEEE

    (2013)
  • L. Mou et al.

    Deep recurrent neural networks for hyperspectral image classification

    IEEE Trans. Geosci. Remote Sens.

    (2017)
  • A. Plaza et al.

    Foreword to the special issue on spectral unmixing of remotely sensed data

    IEEE Trans. Geosci. Remote Sens.

    (2017)
  • L.H. Spangler et al.

    A shallow subsurface controlled release facility in Bozeman, Montana, USA, for testing near surface CO2 detection techniques and transport models

    Environ. Earth Sci.

    (2010)
  • H. Kwon et al.

    Kernel matched signal detectors for hyperspectral target detection

    Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)

    (2005)
  • R.C. Patel et al.

    Super-resolution of hyperspectral images using compressive sensing based approach

    Remote Sens. Spatial Inf. Sci.

    (2012)
  • R. Kawakami et al.

    High-resolution hyperspectral imaging via matrix factorization

    Proceedings of the CVPR

    (2011)
  • C. Lanaras et al.

    Hyperspectral super-resolution by coupled spectral unmixing

    Proceedings of the IEEE International Conference on Computer Vision (ICCV)

    (2015)
  • G. Vivone et al.

    A critical comparison among pansharpening algorithms

    IEEE Trans. Geosci. Remote Sens.

    (2015)
  • Q. Xie et al.

    Multispectral and Hyperspectral Image Fusion by Ms/Hs Fusion Net

    (2019)
  • X. Li et al.

    Hyperspectral and multispectral image fusion based on band simulation

    IEEE Geosci. Remote. Sens. Lett.

    (2020)
  • X. Li et al.

    Hyperspectral and multispectral image fusion via nonlocal low-rank tensor approximation and sparse representation

    IEEE Trans. Geosci. Remote Sens.

    (2020)
  • R. Dian et al.

    Deep hyperspectral image sharpening

    IEEE Trans. Neural Networks Learn. Syst.

    (2018)
  • Q.W. Y. Yuan et al.

    Hyperspectral and multispectral image fusion using non-convex relaxation low rank and total variation regularization

    Proceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS)

    (2020)
  • Q. Li et al.

    Mixed 2d/3d convolutional network for hyperspectral image super-resolution

    Remote. Sens.

    (2020)
  • R. Dian et al.

    Hyperspectral image super-resolution via non-local sparse tensor factorization

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR

    (2017)
  • R. Dian et al.

    Nonlocal sparse tensor factorization for semiblind hyperspectral and multispectral image fusion.

    IEEE Trans. Cybern.

    (2019)
  • S. Li et al.

    Fusing hyperspectral and multispectral images via coupled sparse tensor factorization

    IEEE Trans. Image Process.

    (2018)
  • N. Yokoya et al.

    Coupled nonnegative matrix factorization unmixing for hyperspectral and multispectral data fusion

    IEEE Trans. Geosci. Remote Sens.

    (2012)
  • X.X. Zhu et al.

    Exploiting joint sparsity for pansharpening: the j-sparsefi algorithm

    IEEE Trans. Geosci. Remote Sens.

    (2016)
  • N. Akhtar et al.

    Bayesian sparse representation for hyperspectral image super resolution

    Proceedings of the IEEE Computer Vision and Pattern Recognition (CVPR)

    (2015)
  • M. Simoes et al.

    A convex formulation for hyperspectral image superresolution via subspace-based regularization

    IEEE Trans. Geosci. Remote Sens.

    (2015)
  • R. Heylen et al.

    A review of nonlinear hyperspectral unmixing methods

    IEEE J. Sel. Top. Appl. Earth Observ. Remote Sens.

    (2014)
  • J. Yang et al.

    Hyperspectral and multispectral image fusion via deep two-Branches convolutional neural network

    Remote Sens. (Basel)

    (2018)
  • F. Palsson et al.

    Multispectral and hyperspectral image fusion using a 3-d-convolutional neural network

    IEEE Geosci. Remote Sens. Lett.

    (2017)
  • J. Kim et al.

    Accurate image super-resolution using very deep convolutional networks

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

    (2016)
  • S. Hochreiter et al.

    Long short-term memory

    Neural Comput.

    (1997)
  • Cited by (18)

    • Quaternion convolutional neural networks for hyperspectral image classification

      2023, Engineering Applications of Artificial Intelligence
    • Open-source mobile multispectral imaging system and its applications in biological sample sensing

      2022, Spectrochimica Acta - Part A: Molecular and Biomolecular Spectroscopy
      Citation Excerpt :

      Relying on high-performance photoelectric instruments and highly intelligent and automatic image-processing algorithms, machine vision has been widely used in scientific research and industrial settings. Additionally, there is an increasing interest in studying hyperspectral/multispectral cameras for biological sensing and identification[1–4]. Compared with the monochrome or RGB cameras used in traditional machine vision, hyperspectral/multispectral cameras can synchronously obtain the spatial optical information as well as the spectral data, which serves as an important link between biomolecular chemical properties and quantitative data[5,6].

    • Deep learning in multimodal remote sensing data fusion: A comprehensive review

      2022, International Journal of Applied Earth Observation and Geoinformation
    • A combination method of stacked autoencoder and 3D deep residual network for hyperspectral image classification

      2021, International Journal of Applied Earth Observation and Geoinformation
      Citation Excerpt :

      The OA of deep learning-based classification methods CNN and VAE was 87.92% and 96.36%, respectively, and it was improved by 11.05% and 2.61%, respectively, based on the SAE-3DDRN method. The SSRN structure is more complex that requires two modules to respectively extract spectral and spatial features (Lu et al., 2020; Zhu et al., 2021). There is a high cost for structural design and poor scalability for different HSIs.

    View all citing articles on Scopus
    View full text