Abstract:
Remote sensing image classification is at the center of many tasks in the remote sensing domain. However, the complexity and content variety of aerial images contribute t...Show MoreMetadata
Abstract:
Remote sensing image classification is at the center of many tasks in the remote sensing domain. However, the complexity and content variety of aerial images contribute to making the task still challenging. Transformers have recently achieved state-of-the-art performances for numerous natural language processing and image processing tasks. In this paper, we pro-pose a novel solution towards remote sensing image classification based on a general-to-specific learning curve achieved through a cascaded chain of Vision Transformers (Vit). First, a standard pre-trained Vision Transformer (ViT) is used to provide general information regarding the remote sensing scenes, whereas the specific details are learned by means of a Convolutional Vision Transformer (CvT) which is trained end-to-end. The experiments conducted over two benchmark datasets of high resolution remote sensing images show the effectiveness of the proposed technique. Comparisons to other methods in the literature are also provided.
Date of Conference: 17-22 July 2022
Date Added to IEEE Xplore: 28 September 2022
ISBN Information: