Skip to main content

RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification

  • Conference paper
  • First Online:
Artificial Intelligence (CICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14473))

Included in the following conference series:

Abstract

Remote Sensing (RS) has been widely utilized in various Earth Observation (EO) missions, including land cover classification and environmental monitoring. Unlike computer vision tasks on natural images, collecting remote sensing data is more challenging. To fully exploit the available data and leverage the complementary information across different data sources, we propose a novel approach called Multimodal Transformer for Remote Sensing (RsMmFormer) for image classification, which utilizes both Hyperspectral Image (HSI) and Light Detection and Ranging (LiDAR) data. In contrast to the conventional Vision Transformer (ViT), which does not incorporate the inherent biases and assumptions of convolutions, we improve our RsMmFormer model by incorporating convolutional layers. This allows us to integrate the favorable characteristics of convolutional neural networks (CNNs). Next, we introduce the Multi-scale Multi-head Self-Attention (MSMHSA) module, which enables learning detailed representations, facilitating the detection of small targets occupying only a few pixels in the remote sensing image. The proposed MSMHSA module facilitates the integration of Hyperspectral Imaging (HSI) and LiDAR data in a progressive and detailed manner, effectively attending to both global and local contexts using self-attention mechanisms. Comprehensive experiments conducted on popular benchmarks such as Trento and MUUFL showcase the effectiveness and superiority of our proposed RsMmFormer model for remote sensing image classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Ahmad, M., Shabbir, S.: Hyperspectral image classification-traditional to deep models: a survey for future prospects. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 15, 968–999 (2021)

    Article  Google Scholar 

  2. Bartholome, E., Belward, A.S.: GLC2000: a new approach to global land cover mapping from Earth observation data. Int. J. Remote Sens. 26(9), 1959–1977 (2005)

    Article  Google Scholar 

  3. Roy, S.K., Kar, P.: Revisiting deep hyperspectral feature extraction networks via gradient centralized convolution. IEEE Trans. Geosci. Remote Sens. 60, 1–19 (2021)

    Google Scholar 

  4. Koetz, B., Morsdorf, F.: Multi-source land cover classification for forest fire management based on imaging spectrometry and LiDAR data. Forest Ecol. Manag. 256, 263–271 (2008)

    Article  Google Scholar 

  5. Wu, X., Hong, D.: ORSIm detector: a novel object detection framework in optical remote sensing imagery using spatial-frequency channel features. IEEE Trans. Geosci. Remote Sens. 57, 5146–5158 (2019)

    Article  Google Scholar 

  6. Wu, X., Hong, D.: Fourier-based rotation-invariant feature boosting: an efficient framework for geospatial object detection. IEEE Geosci. Remote Sens. Lett. 17, 302–306 (2019)

    Article  Google Scholar 

  7. Ustin, S.L.: Manual of Remote Sensing, Remote Sensing for Natural Resource Management and Environmental Monitoring. John Wiley & Sons, Hoboken (2004)

    Google Scholar 

  8. Chen, C., Yan, J.: Classification of urban functional areas from remote sensing images and time-series user behavior data. IEEE J. Sel. Topics Appl. Earth Obs. Remote Sens. 14, 1207–1221 (2020)

    Article  Google Scholar 

  9. Ghamisi, P., Benediktsson, J.A., Phinn, S.R.: Land-cover classification using both hyperspectral and LiDAR data. Int. J. Image Data Fusion 6, 189–215 (2015)

    Article  Google Scholar 

  10. Roy, S.K., Deria, A.: Multimodal fusion transformer for remote sensing image classification. arXiv preprint arXiv:2203.16952 (2023)

  11. Makantasis, K., Karantzalos, K., Doulamis, A., Doulamis, N.: Deep supervised learning for hyperspectral data classification through convolutional neural networks. In: International Geoscience and Remote Sensing Symposium (2015)

    Google Scholar 

  12. Hamida, A.B., Benoit, A., Lambert, P., Amar, C.B.: 3-D deep learning approach for remote sensing image classification. IEEE Trans. Geosci. Remote Sens. 56(8), 4420–4434 (2018)

    Article  Google Scholar 

  13. Vaswani, A., Shazeer, N.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)

    Google Scholar 

  14. Hong, D., et al.: SpectralFormer: rethinking hyperspectral image classification with transformers. In: Computer Vision and Pattern Recognition (2021)

    Google Scholar 

  15. Gao, L., Hong, D., Yao, J., Zhang, B., Gamba, P., Chanussot, J.: Spectral superresolution of multispectral imagery with joint sparse and low-rank learning. IEEE Trans. Geosci. Remote Sens. 59, 2269–2280 (2021)

    Article  Google Scholar 

  16. Benediktsson, J.A., Palmason, J.: Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE Trans. Geosci. Remote Sens. 43, 480–491 (2005)

    Article  Google Scholar 

  17. Dalla Mura, M., Benediktsson, J.A.: Morphological attribute profiles for the analysis of very high resolution images. IEEE Trans. Geosci. Remote Sens. 48, 3747–3762 (2010)

    Article  Google Scholar 

  18. Ghamisi, P., Souza, R.: Extinction profiles for the classification of remote sensing data. IEEE Trans. Geosci. Remote Sens. 54, 5631–5645 (2016)

    Article  Google Scholar 

  19. De La Torre, F., Black, M.J.: A framework for robust subspace learning. Int. J. Comput. Vision 54, 117–142 (2003)

    Article  Google Scholar 

  20. Singh, P., Verma, V.K., et al.: Hetconv: heterogeneous kernel-based convolutions for deep CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4835–4844 (2019)

    Google Scholar 

  21. Ham, J., Chen, Y.: Investigation of the random forest framework for classification of hyperspectral data. IEEE Trans. Geosci. Remote Sens. 43, 492–501 (2005)

    Article  Google Scholar 

  22. Dosovitskiy, A., Beyer, L.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Liang He or Kaixing Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, B., Ming, Z., Liu, Y., Feng, W., He, L., Zhao, K. (2024). RsMmFormer: Multimodal Transformer Using Multiscale Self-attention for Remote Sensing Image Classification. In: Fang, L., Pei, J., Zhai, G., Wang, R. (eds) Artificial Intelligence. CICAI 2023. Lecture Notes in Computer Science(), vol 14473. Springer, Singapore. https://doi.org/10.1007/978-981-99-8850-1_27

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8850-1_27

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8849-5

  • Online ISBN: 978-981-99-8850-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics