Abstract:
The abundant spectral signatures and spatial contexts are effectively utilized as the key to hyperspectral image (HSI) classification. Existing convolutional neural netwo...Show MoreMetadata
Abstract:
The abundant spectral signatures and spatial contexts are effectively utilized as the key to hyperspectral image (HSI) classification. Existing convolutional neural networks (CNNs), only focus on locally spatial context information and lack the ability to learn global spectral sequence representations, whereas the transformer performs well in learning the global dependence of sequential data. To solve this issue, inspired by the transformer, we propose an interactive global spectral and local spatial feature fusion transformer called ISSFormer. Specifically, we achieve an elegant integration of self-attention and convolution in a parallel design, i.e., the multi-head self-attention mechanism (MHSA) and the local spatial perception mechanism (LSP). ISSFormer can learn both local spatial feature representation and global spectral feature representation simultaneously. More significantly, we propose a bi-directional interaction mechanism (BIM) of features across the parallel branch to provide complementary clues. The local spatial features and the global spectral features interact through the BIM which could emphasize the local spatial details and add spatial constraints to overcome spectral variability, and can further improve classification performance. With extensive experiments on three benchmark datasets, including Indian Pines, Pavia University, and WHU-Hi-HanChuan, ISSFormer can accomplish superior classification accuracy and visualization performance.
Published in: IEEE Transactions on Circuits and Systems for Video Technology ( Volume: 34, Issue: 9, September 2024)