Abstract
Accurate information retrieval from multi-source and multi-resolution image data constitutes a foundation for knowledge discovery. Scene image classification in the remote sensing (RS) community using aerial very high resolution (VHR) images is one of the well-researched areas, which mostly utilise deep learning (DL)—based methods thanks to their remarkable classification performance. Nevertheless, existing DL-based methods still have a limited ability to capture precise spatial semantic information scattered toward the horizontal and vertical directions across such images at multiple scales and rotations. As such, we herein propose a novel approach, employing an innovative rotation invariant horizontal vertical pooled module (RIHVPM), to well-represent aerial VHR RS images for stable and improved classification performance. Notably, the proposed RIHVPM benefits from the multiple tensor rotations coupled with attention-enabled multiscale horizontal and vertical pooling operations for image representation. An experimental study on three benchmark datasets demonstrates competent and/or higher classification performance (AID: 96.44%, NWPU: 94.32% and UCM: 99.04%) and robustness/stability (minimum standard deviation of 0.001) of the proposed approach.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility statement
All three datasets used in this paper are publicly available. UCM: http://weegee.vision.ucmerced.edu/datasets/landuse.html; AID: https://captain-whu.github.io/AID/ and NWPU: https://gcheng-nwpu.github.io/.
References
Sitaula C, Shahi TB, Marzbanrad F, Aryal J (2023) Recent advances in scene image representation and classification. Multimed Tools Appl 83:1–28
Sitaula C, Aryal J, Bhattacharya A (2023) A novel multiscale attention feature extraction block for aerial remote sensing image classification. IEEE Geosci Remote Sens Lett 20:1–5
Cao R, Fang L, Lu T, He N (2021) Self-attention-based deep feature fusion for remote sensing scene classification. IEEE Geosci Remote Sens Lett 18(1):43–47
Wang X, Duan L, Shi A, Zhou H (2022) Multilevel feature fusion networks with adaptive channel dimensionality reduction for remote sensing scene classification. IEEE Geosci Remote Sens Lett 19:1–5
Weng Q, Mao Z, Lin J, Guo W (2017) Land-use classification via extreme learning classifier based on deep convolutional features. IEEE Geosci Remote Sens Lett 14(5):704–708
Yu Y, Liu F (2018) A two-stream deep fusion framework for high-resolution aerial scene classification. Comput Intell Neurosci 2018:1–13
Sun X, Zhu Q, Qin Q (2021) A multi-level convolution pyramid semantic fusion framework for high-resolution remote sensing image scene classification and annotation. IEEE Access 9:18195–18208
Simonyan K, Zisserman A (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A(2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
He N, Fang L, Li S, Plaza A, Plaza J (2018) Remote sensing scene classification using multilayer stacked covariance pooling. IEEE Trans Geosci Remote Sens 56(12):6899–6910
Xu K, Deng P, Huang H (2023) Mining hierarchical information of CNNS for scene classification of VHR remote sensing images. IEEE Trans Big Data 9(2):542–554
Ma J, Lin W, Tang X, Zhang X, Liu F, Jiao L (2023) Multipretext-task prototypes guided dynamic contrastive learning network for few-shot remote sensing scene classification. IEEE Trans Geosci Remote Sens 61:1–16
Geng J, Xue B, Jiang W (2023) Foreground-background contrastive learning for few-shot remote sensing image scene classification. IEEE Trans Geosci Remote Sens 61:1–12
Lv H, Qian W, Chen T, Yang H, Zhou X (2022) Multiscale feature adaptive fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett 19:1–5
Huang Y, Li X, Du Z, Shen H (2024) Spatiotemporal enhancement and interlevel fusion network for remote sensing images change detection. IEEE Trans Geosci Remote Sens 62:1–14
Wang Q, Liu S, Chanussot J, Li X (2018) Scene classification with recurrent attention of VHR remote sensing images. IEEE Trans Geosci Remote Sens 57(2):1155–1167
He N, Fang L, Li S, Plaza J, Plaza A (2019) Skip-connected covariance network for remote sensing scene classification. IEEE Trans Neural Netw Learn Syst 31(5):1461–1474
Wang S, Guan Y, Shao L (2020) Multi-granularity canonical appearance pooling for remote sensing scene classification. IEEE Trans Image Process 29:5396–5407
Wang Q, Huang W, Xiong Z, Li X (2022) Looking closer at the scene: multiscale representation learning for remote sensing image scene classification. IEEE Trans Neural Netw Learn Syst 33(4):1414–1428
Guo J, Jia N, Bai J (2022) Transformer based on channel-spatial attention for accurate classification of scenes in remote sensing image. Sci Rep 12(1):15473
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth \(16\times 16\) words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Zhang J, Zhao H, Li J (2021) TRS: transformers for remote sensing scene classification. Remote Sens 13(20):4143
Sitaula C, Kc S, Aryal J (2024) Enhanced multi-level features for very high resolution remote sensing scene classification. Neural Comput Appl 36(13):1–13
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp 248–255
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Park J-Y, Hwang Y, Lee D, Kim J-H (2020) Marsnet: multi-label classification network for images of various sizes. IEEE Access 8:21832–21846
Shi J, Liu W, Shan H, Li E, Li X, Zhang L (2023) Remote sensing scene classification based on multibranch fusion attention network. IEEE Geosci Remote Sens Lett 20:1–5
Xia G-S, Hu J, Hu F, Shi B, Bai X, Zhong Y, Zhang L, Lu X (2017) AID: a benchmark data set for performance evaluation of aerial scene classification. IEEE Trans Geosci Remote Sens 55(7):3965–3981
Cheng G, Han J, Lu X (2017) Remote sensing image scene classification: benchmark and state of the art. Proc IEEE 105(10):1865–1883
Yang Y, Newsam S (2012) Geographic image retrieval using local invariant features. IEEE Trans Geosci Remote Sens 51(2):818–832
Chollet F, et al (2024) Keras. https://github.com/fchollet/keras
Rossum G (1995) Python reference manual. In: Technical report, Amsterdam, The Netherlands
Mundu A (2024) GFLOP in Keras. https://github.com/tensorflow/tensorflow/issues/32809. Accessed on 18 May 2024
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2818–2826
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence
Howard A.G, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H(2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: Proceedings of the international conference on machine learning, pp 6105–6114
Dekking FM (2005) A modern introduction to probability and statistics: understanding why and how. Springer, New York
Acknowledgements
Jagannath Aryal is supported by the University of Melbourne (UoM), internal funding for this research. Further, this research, which is the outcome of the postdoctoral research work of the first author, was supported by the UOM’s Research Computing Services and the Petascale Campus Initiative.
Author information
Authors and Affiliations
Contributions
CS involved in data curation, conceptualization, methodology, software, writing, original draft preparation, writing review and editing. JA involved in supervision, writing, review, validation, resources, and project administration
Corresponding author
Ethics declarations
Conflict of interest
The authors have no Conflict of interest to declare that are relevant to the content of this article.
Code availability
The source code will be made available upon request by the journal and will be publicly available after the publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Sitaula, C., Aryal, J. A rotation-invariant horizontal vertical pooled module for remote sensing image representation. Neural Comput & Applic 36, 18661–18673 (2024). https://doi.org/10.1007/s00521-024-10180-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-024-10180-8