Skip to main content

Advertisement

Log in

A rotation-invariant horizontal vertical pooled module for remote sensing image representation

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Accurate information retrieval from multi-source and multi-resolution image data constitutes a foundation for knowledge discovery. Scene image classification in the remote sensing (RS) community using aerial very high resolution (VHR) images is one of the well-researched areas, which mostly utilise deep learning (DL)—based methods thanks to their remarkable classification performance. Nevertheless, existing DL-based methods still have a limited ability to capture precise spatial semantic information scattered toward the horizontal and vertical directions across such images at multiple scales and rotations. As such, we herein propose a novel approach, employing an innovative rotation invariant horizontal vertical pooled module (RIHVPM), to well-represent aerial VHR RS images for stable and improved classification performance. Notably, the proposed RIHVPM benefits from the multiple tensor rotations coupled with attention-enabled multiscale horizontal and vertical pooling operations for image representation. An experimental study on three benchmark datasets demonstrates competent and/or higher classification performance (AID: 96.44%, NWPU: 94.32% and UCM: 99.04%) and robustness/stability (minimum standard deviation of 0.001) of the proposed approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availibility statement

All three datasets used in this paper are publicly available. UCM: http://weegee.vision.ucmerced.edu/datasets/landuse.html; AID: https://captain-whu.github.io/AID/ and NWPU: https://gcheng-nwpu.github.io/.

References

  1. Sitaula C, Shahi TB, Marzbanrad F, Aryal J (2023) Recent advances in scene image representation and classification. Multimed Tools Appl 83:1–28

    Google Scholar 

  2. Sitaula C, Aryal J, Bhattacharya A (2023) A novel multiscale attention feature extraction block for aerial remote sensing image classification. IEEE Geosci Remote Sens Lett 20:1–5

    Article  Google Scholar 

  3. Cao R, Fang L, Lu T, He N (2021) Self-attention-based deep feature fusion for remote sensing scene classification. IEEE Geosci Remote Sens Lett 18(1):43–47

    Article  Google Scholar 

  4. Wang X, Duan L, Shi A, Zhou H (2022) Multilevel feature fusion networks with adaptive channel dimensionality reduction for remote sensing scene classification. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  5. Weng Q, Mao Z, Lin J, Guo W (2017) Land-use classification via extreme learning classifier based on deep convolutional features. IEEE Geosci Remote Sens Lett 14(5):704–708

    Article  Google Scholar 

  6. Yu Y, Liu F (2018) A two-stream deep fusion framework for high-resolution aerial scene classification. Comput Intell Neurosci 2018:1–13

    Google Scholar 

  7. Sun X, Zhu Q, Qin Q (2021) A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation. IEEE Access 9:18195–18208

    Article  Google Scholar 

  8. Simonyan K, Zisserman A (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  9. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A(2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  10. He N, Fang L, Li S, Plaza A, Plaza J (2018) Remote sensing scene classification using multilayer stacked covariance pooling. IEEE Trans Geosci Remote Sens 56(12):6899–6910

    Article  Google Scholar 

  11. Xu K, Deng P, Huang H (2023) Mining hierarchical information of CNNS for scene classification of VHR remote sensing images. IEEE Trans Big Data 9(2):542–554

    Article  Google Scholar 

  12. Ma J, Lin W, Tang X, Zhang X, Liu F, Jiao L (2023) Multipretext-task prototypes guided dynamic contrastive learning network for few-shot remote sensing scene classification. IEEE Trans Geosci Remote Sens 61:1–16

    Google Scholar 

  13. Geng J, Xue B, Jiang W (2023) Foreground-background contrastive learning for few-shot remote sensing image scene classification. IEEE Trans Geosci Remote Sens 61:1–12

    Article  Google Scholar 

  14. Lv H, Qian W, Chen T, Yang H, Zhou X (2022) Multiscale feature adaptive fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5

    Google Scholar 

  15. Huang Y, Li X, Du Z, Shen H (2024) Spatiotemporal enhancement and interlevel fusion network for remote sensing images change detection. IEEE Trans Geosci Remote Sens 62:1–14

    Google Scholar 

  16. Wang Q, Liu S, Chanussot J, Li X (2018) Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans Geosci Remote Sens 57(2):1155–1167

    Article  Google Scholar 

  17. He N, Fang L, Li S, Plaza J, Plaza A (2019) Skip-connected covariance network for remote sensing scene classification. IEEE Trans Neural Netw Learn Syst 31(5):1461–1474

    Article  Google Scholar 

  18. Wang S, Guan Y, Shao L (2020) Multi-granularity canonical appearance pooling for remote sensing scene classification. IEEE Trans Image Process 29:5396–5407

    Article  MathSciNet  Google Scholar 

  19. Wang Q, Huang W, Xiong Z, Li X (2022) Looking closer at the scene: multiscale representation learning for remote sensing image scene classification. IEEE Trans Neural Netw Learn Syst 33(4):1414–1428

    Article  Google Scholar 

  20. Guo J, Jia N, Bai J (2022) Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image. Sci Rep 12(1):15473

    Article  Google Scholar 

  21. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929

  22. Zhang J, Zhao H, Li J (2021) TRS: transformers for remote sensing scene classification. Remote Sens 13(20):4143

    Article  Google Scholar 

  23. Sitaula C, Kc S, Aryal J (2024) Enhanced multi-level features for very high resolution remote sensing scene classification. Neural Comput Appl 36(13):1–13

    Article  Google Scholar 

  24. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255

  25. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  26. Park J-Y, Hwang Y, Lee D, Kim J-H (2020) Marsnet: multi-label classification network for images of various sizes. IEEE Access 8:21832–21846

    Article  Google Scholar 

  27. Shi J, Liu W, Shan H, Li E, Li X, Zhang L (2023) Remote sensing scene classification based on multibranch fusion attention network. IEEE Geosci Remote Sens Lett 20:1–5

    Google Scholar 

  28. Xia G-S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981

    Article  Google Scholar 

  29. Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105(10):1865–1883

    Article  Google Scholar 

  30. Yang Y, Newsam S (2012) Geographic image retrieval using local invariant features. IEEE Trans Geosci Remote Sens 51(2):818–832

    Article  Google Scholar 

  31. Chollet F, et al (2024) Keras. https://github.com/fchollet/keras

  32. Rossum G (1995) Python reference manual. In: Technical report, Amsterdam, The Netherlands

  33. Mundu A (2024) GFLOP in Keras. https://github.com/tensorflow/tensorflow/issues/32809. Accessed on 18 May 2024

  34. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  35. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826

  36. Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence

  37. Howard A.G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H(2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861

  38. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the international conference on machine learning, pp 6105–6114

  39. Dekking FM (2005) A modern introduction to probability and statistics: understanding why and how. Springer, New York

    Book  Google Scholar 

Download references

Acknowledgements

Jagannath Aryal is supported by the University of Melbourne (UoM), internal funding for this research. Further, this research, which is the outcome of the postdoctoral research work of the first author, was supported by the UOM’s Research Computing Services and the Petascale Campus Initiative.

Author information

Authors and Affiliations

Authors

Contributions

CS involved in data curation, conceptualization, methodology, software, writing, original draft preparation, writing review and editing. JA involved in supervision, writing, review, validation, resources, and project administration

Corresponding author

Correspondence to Chiranjibi Sitaula.

Ethics declarations

Conflict of interest

The authors have no Conflict of interest to declare that are relevant to the content of this article.

Code availability

The source code will be made available upon request by the journal and will be publicly available after the publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sitaula, C., Aryal, J. A rotation-invariant horizontal vertical pooled module for remote sensing image representation. Neural Comput & Applic 36, 18661–18673 (2024). https://doi.org/10.1007/s00521-024-10180-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-024-10180-8

Keywords