RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification

Zhang, Bo; Ming, Zuheng; Liu, Yaqian; Feng, Wei; He, Liang; Zhao, Kaixing

doi:10.1007/978-981-99-8850-1_27

Bo Zhang¹¹,
Zuheng Ming¹²,
Yaqian Liu¹¹,
Wei Feng¹³,
Liang He¹¹ &
…
Kaixing Zhao¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

CAAI International Conference on Artificial Intelligence

174 Accesses
1 Citations

Abstract

Remote Sensing (RS) has been widely utilized in various Earth Observation (EO) missions, including land cover classification and environmental monitoring. Unlike computer vision tasks on natural images, collecting remote sensing data is more challenging. To fully exploit the available data and leverage the complementary information across different data sources, we propose a novel approach called Multimodal Transformer for Remote Sensing (RsMmFormer) for image classification, which utilizes both Hyperspectral Image (HSI) and Light Detection and Ranging (LiDAR) data. In contrast to the conventional Vision Transformer (ViT), which does not incorporate the inherent biases and assumptions of convolutions, we improve our RsMmFormer model by incorporating convolutional layers. This allows us to integrate the favorable characteristics of convolutional neural networks (CNNs). Next, we introduce the Multi-scale Multi-head Self-Attention (MSMHSA) module, which enables learning detailed representations, facilitating the detection of small targets occupying only a few pixels in the remote sensing image. The proposed MSMHSA module facilitates the integration of Hyperspectral Imaging (HSI) and LiDAR data in a progressive and detailed manner, effectively attending to both global and local contexts using self-attention mechanisms. Comprehensive experiments conducted on popular benchmarks such as Trento and MUUFL showcase the effectiveness and superiority of our proposed RsMmFormer model for remote sensing image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ahmad, M., Shabbir, S.: Hyperspectral image classification-traditional to deep models: a survey for future prospects. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 15, 968–999 (2021)
Article Google Scholar
Bartholome, E., Belward, A.S.: GLC2000: a new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 26(9), 1959–1977 (2005)
Article Google Scholar
Roy, S.K., Kar, P.: Revisiting deep hyperspectral feature extraction networks via gradient centralized convolution. IEEE Trans. Geosci. Remote Sens. 60, 1–19 (2021)
Google Scholar
Koetz, B., Morsdorf, F.: Multi-source land cover classification for forest fire management based on imaging spectrometry and LiDAR data. Forest Ecol. Manag. 256, 263–271 (2008)
Article Google Scholar
Wu, X., Hong, D.: ORSIm detector: a novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans. Geosci. Remote Sens. 57, 5146–5158 (2019)
Article Google Scholar
Wu, X., Hong, D.: Fourier-based rotation-invariant feature boosting: an efficient framework for geospatial object detection. IEEE Geosci. Remote Sens. Lett. 17, 302–306 (2019)
Article Google Scholar
Ustin, S.L.: Manual of Remote Sensing, Remote Sensing for Natural Resource Management and Environmental Monitoring. John Wiley & Sons, Hoboken (2004)
Google Scholar
Chen, C., Yan, J.: Classification of urban functional areas from remote sensing images and time-series user behavior data. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 1207–1221 (2020)
Article Google Scholar
Ghamisi, P., Benediktsson, J.A., Phinn, S.R.: Land-cover classification using both hyperspectral and LiDAR data. Int. J. Image Data Fusion 6, 189–215 (2015)
Article Google Scholar
Roy, S.K., Deria, A.: Multimodal fusion transformer for remote sensing image classification. arXiv preprint arXiv:2203.16952 (2023)
Makantasis, K., Karantzalos, K., Doulamis, A., Doulamis, N.: Deep supervised learning for hyperspectral data classification through convolutional neural networks. In: International Geoscience and Remote Sensing Symposium (2015)
Google Scholar
Hamida, A.B., Benoit, A., Lambert, P., Amar, C.B.: 3-D deep learning approach for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 56(8), 4420–4434 (2018)
Article Google Scholar
Vaswani, A., Shazeer, N.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Google Scholar
Hong, D., et al.: SpectralFormer: rethinking hyperspectral image classification with transformers. In: Computer Vision and Pattern Recognition (2021)
Google Scholar
Gao, L., Hong, D., Yao, J., Zhang, B., Gamba, P., Chanussot, J.: Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Trans. Geosci. Remote Sens. 59, 2269–2280 (2021)
Article Google Scholar
Benediktsson, J.A., Palmason, J.: Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 43, 480–491 (2005)
Article Google Scholar
Dalla Mura, M., Benediktsson, J.A.: Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans. Geosci. Remote Sens. 48, 3747–3762 (2010)
Article Google Scholar
Ghamisi, P., Souza, R.: Extinction profiles for the classification of remote sensing data. IEEE Trans. Geosci. Remote Sens. 54, 5631–5645 (2016)
Article Google Scholar
De La Torre, F., Black, M.J.: A framework for robust subspace learning. Int. J. Comput. Vision 54, 117–142 (2003)
Article Google Scholar
Singh, P., Verma, V.K., et al.: Hetconv: heterogeneous kernel-based convolutions for deep CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4835–4844 (2019)
Google Scholar
Ham, J., Chen, Y.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 492–501 (2005)
Article Google Scholar
Dosovitskiy, A., Beyer, L.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2010)

Download references

Author information

Authors and Affiliations

School of Software, Northwestern Polytechnical University, Xi’an, China
Bo Zhang, Yaqian Liu, Liang He & Kaixing Zhao
Laboratoire L2TI, Institut Galilée, Université Sorbonne Paris Nord, Villetaneuse, France
Zuheng Ming
School of Electronic Engineering, Xidian University, Xi’an, China
Wei Feng

Authors

Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zuheng Ming
View author publications
You can also search for this author in PubMed Google Scholar
Yaqian Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Liang He
View author publications
You can also search for this author in PubMed Google Scholar
Kaixing Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Liang He or Kaixing Zhao .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Lu Fang
Duke University, Durham, NC, USA
Jian Pei
Shanghai Jiao Tong Univeristy, Shanghai, China
Guangtao Zhai
Chinese Academy of Sciences, Beijing, China
Ruiping Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, B., Ming, Z., Liu, Y., Feng, W., He, L., Zhao, K. (2024). RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_27

Download citation

DOI: https://doi.org/10.1007/978-981-99-8850-1_27
Published: 04 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8849-5
Online ISBN: 978-981-99-8850-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification