Abstract
Remote Sensing (RS) has been widely utilized in various Earth Observation (EO) missions, including land cover classification and environmental monitoring. Unlike computer vision tasks on natural images, collecting remote sensing data is more challenging. To fully exploit the available data and leverage the complementary information across different data sources, we propose a novel approach called Multimodal Transformer for Remote Sensing (RsMmFormer) for image classification, which utilizes both Hyperspectral Image (HSI) and Light Detection and Ranging (LiDAR) data. In contrast to the conventional Vision Transformer (ViT), which does not incorporate the inherent biases and assumptions of convolutions, we improve our RsMmFormer model by incorporating convolutional layers. This allows us to integrate the favorable characteristics of convolutional neural networks (CNNs). Next, we introduce the Multi-scale Multi-head Self-Attention (MSMHSA) module, which enables learning detailed representations, facilitating the detection of small targets occupying only a few pixels in the remote sensing image. The proposed MSMHSA module facilitates the integration of Hyperspectral Imaging (HSI) and LiDAR data in a progressive and detailed manner, effectively attending to both global and local contexts using self-attention mechanisms. Comprehensive experiments conducted on popular benchmarks such as Trento and MUUFL showcase the effectiveness and superiority of our proposed RsMmFormer model for remote sensing image classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ahmad, M., Shabbir, S.: Hyperspectral image classification-traditional to deep models: a survey for future prospects. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 15, 968–999 (2021)
Bartholome, E., Belward, A.S.: GLC2000: a new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 26(9), 1959–1977 (2005)
Roy, S.K., Kar, P.: Revisiting deep hyperspectral feature extraction networks via gradient centralized convolution. IEEE Trans. Geosci. Remote Sens. 60, 1–19 (2021)
Koetz, B., Morsdorf, F.: Multi-source land cover classification for forest fire management based on imaging spectrometry and LiDAR data. Forest Ecol. Manag. 256, 263–271 (2008)
Wu, X., Hong, D.: ORSIm detector: a novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans. Geosci. Remote Sens. 57, 5146–5158 (2019)
Wu, X., Hong, D.: Fourier-based rotation-invariant feature boosting: an efficient framework for geospatial object detection. IEEE Geosci. Remote Sens. Lett. 17, 302–306 (2019)
Ustin, S.L.: Manual of Remote Sensing, Remote Sensing for Natural Resource Management and Environmental Monitoring. John Wiley & Sons, Hoboken (2004)
Chen, C., Yan, J.: Classification of urban functional areas from remote sensing images and time-series user behavior data. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 1207–1221 (2020)
Ghamisi, P., Benediktsson, J.A., Phinn, S.R.: Land-cover classification using both hyperspectral and LiDAR data. Int. J. Image Data Fusion 6, 189–215 (2015)
Roy, S.K., Deria, A.: Multimodal fusion transformer for remote sensing image classification. arXiv preprint arXiv:2203.16952 (2023)
Makantasis, K., Karantzalos, K., Doulamis, A., Doulamis, N.: Deep supervised learning for hyperspectral data classification through convolutional neural networks. In: International Geoscience and Remote Sensing Symposium (2015)
Hamida, A.B., Benoit, A., Lambert, P., Amar, C.B.: 3-D deep learning approach for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 56(8), 4420–4434 (2018)
Vaswani, A., Shazeer, N.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Hong, D., et al.: SpectralFormer: rethinking hyperspectral image classification with transformers. In: Computer Vision and Pattern Recognition (2021)
Gao, L., Hong, D., Yao, J., Zhang, B., Gamba, P., Chanussot, J.: Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Trans. Geosci. Remote Sens. 59, 2269–2280 (2021)
Benediktsson, J.A., Palmason, J.: Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 43, 480–491 (2005)
Dalla Mura, M., Benediktsson, J.A.: Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans. Geosci. Remote Sens. 48, 3747–3762 (2010)
Ghamisi, P., Souza, R.: Extinction profiles for the classification of remote sensing data. IEEE Trans. Geosci. Remote Sens. 54, 5631–5645 (2016)
De La Torre, F., Black, M.J.: A framework for robust subspace learning. Int. J. Comput. Vision 54, 117–142 (2003)
Singh, P., Verma, V.K., et al.: Hetconv: heterogeneous kernel-based convolutions for deep CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4835–4844 (2019)
Ham, J., Chen, Y.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 492–501 (2005)
Dosovitskiy, A., Beyer, L.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2010)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, B., Ming, Z., Liu, Y., Feng, W., He, L., Zhao, K. (2024). RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_27
Download citation
DOI: https://doi.org/10.1007/978-981-99-8850-1_27
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)