Abstract:
The complementary and heterogeneous properties fusion of multimodal data (such as hyperspectral (HS), LiDAR, and synthetic aperture radar (SAR) data) can significantly im...Show MoreMetadata
Abstract:
The complementary and heterogeneous properties fusion of multimodal data (such as hyperspectral (HS), LiDAR, and synthetic aperture radar (SAR) data) can significantly improve the accuracy of remote sensing (RS) images joint classification. Thus, we propose a spatial–spectral bilinear representation fusion network ( \text{S}^{2} BRFNet), which captures long-range dependencies cross-modality and within the same modality to achieve the final joint classification. First, a cross-modal spatial–spectral representation module ( \text{S}^{2} RM) is designed, it utilizes spatial–spectral attention and self-attention between heterogeneous data to enhance the characterization capabilities of cross-modal complementary properties and spatial–spectral features of single-source data. Second, a semantic space-guided bilinear feature fusion module ( \text{S}^{2} BFM) is developed, which uses deep and shallow features to regain fine-grained features. It uses shallow location details to improve the semantic prediction of deep features. Furthermore, it uses the different representation capabilities of different layers for objects with obvious feature differences to enhance the feature advantages. Therefore, rich global context information is obtained. Finally, the semantic space re-weight strategy is used to guide the outer product fusion of heterogeneous features, which enhances the ability of the network to identify similar features. Classification experiments are carried out on four common datasets of different modality combinations (HS-SAR-DSM Augsburg, Berlin, Trento, and Muufl), and this can prove the superiority of the \text{S}^{2} BRFNet.
Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 61)