Elsevier

Information Fusion

Volume 80, April 2022, Pages 205-225
Information Fusion

Full length article
PoNet: A universal physical optimization-based spectral super-resolution network for arbitrary multispectral images

https://doi.org/10.1016/j.inffus.2021.10.016Get rights and content

Highlights

  • Generalized spectral super-resolution is presented for all multispectral imaging.

  • Considering physical degradation into CNN modeling improves model interpretability.

  • The proposed attention learns channel-to-channel parameters and speeds up model.

  • Utilizing both deep and shallow features can further improve model performance.

Abstract

Spectral super-resolution is a very important technique to obtain hyperspectral images from only multispectral images, which can effectively solve the high acquisition cost and low spatial resolution of hyperspectral images. However, in practice, multispectral channels or images captured by the same sensor are often with different spatial resolutions, which brings a severe challenge to spectral super-resolution. This paper proposed a universal spectral super-resolution network based on physical optimization unfolding for arbitrary multispectral images, including single-resolution and cross-scale multispectral images. Furthermore, two new strategies are proposed to make full use of the spectral information, namely, cross-dimensional channel attention and cross-depth feature fusion. Experimental results on five data sets show superiority and stability of PoNet addressing any spectral super-resolution situations.

Introduction

Hyperspectral (HS) imaging is a technique used to acquire radiation characteristics of the observed objects with a fine spectral resolution. With rich spectral information, hyperspectral images are used in many applications, such as semantic segmentation [1], [2], scene classification [3], [4], [5], [6], [7], object detection [8], [9], and target tracking [10], [11]. With continuous spectrum in pixels, hyperspectral images can improve the discriminability of objects, have attracted increasing attention in many fields, for example, food science [12], [13], atmosphere monitoring [14], [15], [16], medical science [17], [18], and remote sensing [2], [3], [4], [5], [6], [7].

Although hyperspectral images have been greatly used, the high acquisition cost and low spatial resolution hinder their development of finer applications, owing to the increase in the spatial size of sensors for each pixel when generating spectra with high signal-to-noise ratios [19]. In contrast, multispectral sensors usually capture high-spatial-resolution images with only several spectral channels, which means rich spatial detail but less spectral information. Thus, how to acquire high-resolution hyperspectral images from high-resolution multispectral images at low cost has attracted more attention. In other words, given a multispectral image, a hyperspectral image with the same spatial resolution and high spectral resolution can be obtained by increasing the channel number of the multispectral, which is called spectral super-resolution.

To solve this ill-posed problem, many researchers firstly restore hyperspectral images by utilizing sparse recovery and dictionary learning to extract the hyperspectral dictionary and sparse coefficients. Nguyen et al. [20] proposed a new training-based spectral recovery method by improving a radial basis function network with RGB white-balancing to normalize the illumination. Then, Robles-Kelly [21] employed color and appearance information to achieve spectral super-resolution through a prototype set extracted from training samples using constrained sparse coding. Arad and Ben-Shahar [22] learned a hyperspectral dictionary by K-means Singular Value Decomposition (K-SVD) and described RGB images using the projected dictionary. Jia et al. [23] applied a nonlinear dimensionality reduction technique to natural spectra and map an RGB vector to its corresponding hyperspectral vector via a manifold-based method. Inspired by their previous work in spatial super-resolution, Aeschbacher and Wu et al. [24] proposed a new shallow method for enhancing the spectral resolution of RGB images. Akhtar et al. [25] employed gaussian processes to improve the extracted dictionary through sparse representation recently. The main idea behind all the mentioned methods is extracting hyperspectral dictionary from a set of hyperspectral images and recover spectra with the coefficients calculated on multispectral images. The modeling of these methods is similar to spectral unmixing, in which the variables are all with physical meanings, the dictionary is equal to spectral endmember and sparse coefficients are equivalent to fractional abundances. Nevertheless, spectra of observed objects cannot be represented perfectly by finite hyperspectral dictionary [26].

Instead of extracting hyperspectral dictionary, another category of methods aims to exploit the relationship between multispectral and hyperspectral images directly. Because the mapping is severely nonlinear, deep learning-based methods are employed to achieve it [27], [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38]. Inspired by the semantic segmentation architecture Tiramisu [39], Galliani et al. [40] proposed a deep DenseUnet with 56 convolutional layers. Further, Rangnekar et al. [41] applied a conditional adversarial framework to train CNN. Learned from the spatial super-resolution task, Xiong et al. [42] proposed an adapted network from a very deep CNN for super-resolution (VDSR) to recover hyperspectral images. To further improve results, Shi et al. [43] used dense blocks with path-widening feature fusion. Fu et al. [44] designed a spatial–spectral CNN-based method, which can jointly select the camera spectral sensitivity and learn to enhance the spectral resolution of RGB images. Also noted the importance of spectral response functions, Nie et al. [45] employed a 1 × 1 convolutional layer to learn it and help achieve spectral super-resolution. To show that moderate deep learning can also achieve spectral super-resolution, Can et al. [46] proposed a 9-layer residual CNN with parametric ReLU. Zhang et al. [47] currently proposed a pixel-aware deep function-mixture network with multi-scale kernels to increase the network flexibility, which can adaptively determine the receptive field size for each pixel. Notwithstanding good performance the deep learning-based methods mentioned above can achieve, they can only deal with the input images without spatial degradation. In practical application, however, channels or images captured by the same satellite are often with different spatial resolutions, which owes to the imaging process of devices, such as Sentinel-2, WorldView-2, Gaofen-1, and Gaofen-2, namely cross-scale multispectral images. On the contrary, images consisting of channels with same spatial resolution are called single-resolution images in this paper.

There has been very little research considering spatial degradation in spectral super-resolution. Mei et al. [48] obtain high-spatial-resolution (HR) and high-spectral-resolution images from low-spatial-resolution (LR) multispectral images using two similar CNNs for two stages. Although their method can enhance the spatial details as well as spectral resolution, nevertheless they directly stack convolutional layers without any physical meanings to find a mapping function between input and output images. Furthermore, they only used images with one spatial resolution, while as mentioned before, there are many images with different spatial resolutions (lower or higher than that of used channels) obtained even by the same satellite.

To make full use of images at different scales and consider physical degradation in spectral super-resolution based on deep learning, a deep physical optimization-based CNN (PoNet) which can address generalized spectral super-resolution for single-resolution or cross-scale multispectral images is proposed. As shown in Fig. 1, generalized spectral super-resolution includes traditional spectral super-resolution (SSR) and spectral super-resolution on cross-scale multispectral images. Concentrating on spectral super-resolution with cross-scale spectral information, there are two subproblems need to be solved, one is how to use auxiliary lower-resolution channels with spectral information covering different wavelength to optimize spectral super-resolution results (FusSR), and the other is how to address joint enhancement of spatial and spectral resolution by introducing more high-resolution spatial details from panchromatic images (PansSR). The proposed PoNet can handle them well. The main contributions of this paper can be summarized as follows:

  • For the first time, we define and address generalized spectral super-resolution for single-resolution or cross-scale spectral information, in other words, SSR, FusSR, and PansSR. All the cases mentioned will be discussed in experiments.

  • Considering physical degradation into CNN modeling, PoNet can better exploit cross-scale spectral information and reconstruct hyperspectral images more finely. Model construction following the dataflow of optimization algorithms gives networks physical interpretability to help people better understand how CNN works.

  • To further improve the model performance, cross-dimensional channel attention achieved the parameter learning channel-to-channel as well as reducing the number of parameters is proposed in this paper. Furthermore, we also employ cross-depth feature fusion to ensure the effective utilization of deep and shallow features.

The remaining part of this paper is organized as follows. Section 2 derives the spectral super-resolution algorithm considering spatial degradation, and then introduces the proposed PoNet in detail. Data used in this article are introduced in Section 3. Section 4 presents some experimental results on a natural image data set to verify the reliability of the proposed model. Then, PoNet is compared with other methods in two cases mentioned above. Finally, we draw some conclusions in Section 5.

Section snippets

Method

Considering spatial degradation between multispectral and hyperspectral imaging modes, the observation model is proposed at first. Based on this model, SSR in cross-scale images is formulated and optimized in the variational model-based algorithm. By unrolling optimization algorithms to deep learning, PoNet is depicted as shown in Fig. 2. Cross-scale multispectral images are fed into the inverse imaging block and fusion layer to reconstruct initial results. Then, several recurrent optimization

Data

In this section, we will introduce two situations of the spectral super-resolution considering spatial degradation, including FusSR and PansSR. FusSR is utilizing more auxiliary lower-resolution spectral channels to obtain better spectral recovery, and PansSR is to handle the joint enhancement of spatial resolution and spectral resolution with the help of high-resolution panchromatic images.

Experiments & results

In this section, we show some experiments to verify the superiority of the proposed PoNet, including experiments at different multispectral imaging situations, analysis of sampling operators, and ablation study. Details are as follows.

Conclusions

This paper presents a universal physical optimization-based CNN named PoNet to address spectral super-resolution for arbitrary multispectral images including data with multiple spatial resolutions, namely, SSR, FusSR, and PansSR. Unfolding optimization algorithm considering physical degradation to deep learning gives CNN the important physical interpretability, which provides great help to recover hyperspectral information. Besides, to learn parameters channel-to-channel adaptively, as well as

CRediT authorship contribution statement

Jiang He: Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Data curation, Writing – original draft, Writing – review & editing. Qiangqiang Yuan: Conceptualization, Resources Writing – original draft, Supervision, Project administration, Funding acquisition. Jie Li: Conceptualization, Methodology, Resources, Data curation, Writing – original draft, Writing - review & editing, Supervision, Project administration, Funding acquisition. Liangpei Zhang:

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 41922008, Grant 62071341, and Grant 61971319; in part by the Hubei Science Foundation for Distinguished Young Scholars under Grant 2020CFA051; and in part by the Fundamental Research Funds for the Central Universities under Grant 531118010209.

References (65)

  • LiJ. et al.

    Semisupervised hyperspectral image segmentation using multinomial logistic regression with active learning

    IEEE Trans. Geosci. Remote Sens.

    (2010)
  • HarsanyiJ.C. et al.

    Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach

    IEEE Trans. Geosci. Remote Sens.

    (1994)
  • Camps-VallsG. et al.

    Kernel-based methods for hyperspectral image classification

    IEEE Trans. Geosci. Remote Sens.

    (2005)
  • Camps-VallsG. et al.

    Advances in hyperspectral image classification: Earth monitoring with statistical learning methods

    IEEE Signal Process. Mag.

    (2014)
  • MelganiF. et al.

    Classification of hyperspectral remote sensing images with support vector machines

    IEEE Trans. Geosci. Remote Sens.

    (2004)
  • FauvelM. et al.

    Spectral and spatial classification of hyperspectral data using svms and morphological profiles

    IEEE Trans. Geosci. Remote Sens.

    (2008)
  • RenH. et al.

    Automatic spectral target recognition in hyperspectral imagery

    IEEE Trans. Aerosp. Electron. Syst.

    (2003)
  • ManolakisD. et al.

    Detection algorithms for hyperspectral imaging applications

    IEEE Signal Process. Mag.

    (2002)
  • NguyenH.V. et al.

    Tracking via object reflectance using a hyperspectral video camera

  • WangT. et al.

    Bio-inspired adaptive hyperspectral imaging for real-time target tracking

    IEEE Sens. J.

    (2010)
  • BarnsleyM.J. et al.

    The PROBA/CHRIS mission: a low-cost smallsat for hyperspectral multiangle observations of the earth surface and atmosphere

    IEEE Trans. Geosci. Remote Sens.

    (2004)
  • LuG. et al.

    Medical hyperspectral imaging: a review

    J. Biomed. Opt.

    (2014)
  • DianR. et al.

    Deep hyperspectral image sharpening

    IEEE Trans. Neural Netw. Learn. Syst.

    (2018)
  • NguyenR.M.H. et al.

    Training-based spectral reconstruction from a single RGB image

  • Robles-KellyA.

    Single image spectral reconstruction for multimedia applications

  • AradB. et al.

    Sparse recovery of hyperspectral signal from natural RGB images

  • JiaY. et al.

    From RGB to spectrum for natural scenes via manifold-based mapping

  • WuJ. et al.

    In defense of shallow learned spectral reconstruction from RGB images

  • AkhtarN. et al.

    Hyperspectral recovery from RGB images using Gaussian processes

    IEEE Trans. Pattern Anal. Mach. Intell.

    (2020)
  • LiJ. et al.

    Hybrid 2-d-3-d deep residual attentional network with structure tensor constraints for spectral super-resolution of RGB images

    IEEE Trans. Geosci. Remote Sens.

    (2020)
  • JiangJ. et al.

    Learning spatial-spectral prior for super-resolution of hyperspectral imagery

    IEEE Trans. Comput. Imaging

    (2020)
  • WangX. et al.

    Hyperspectral image super-resolution via recurrent feedback embedding and spatial-spectral consistency regularization

    IEEE Trans. Geosci. Remote Sens.

    (2021)
  • Cited by (31)

    View all citing articles on Scopus
    View full text