Abstract:
In this paper, we introduce an efficient and lightweight hybrid Transformer architecture, ingeniously integrating convolutions within Transformer blocks for semantic segm...Show MoreMetadata
Abstract:
In this paper, we introduce an efficient and lightweight hybrid Transformer architecture, ingeniously integrating convolutions within Transformer blocks for semantic segmentation of remote sensing Very High Resolution (VHR) imagery. To simultaneously avoid the high computational complexity in the shallow layers and capture the local representations of the VHR images, we propose the Group-Team Convolution Modulation (GTCM) module that uses convolutions to approximate the effect of attention mechanisms and modulates features in channel dimension. Additionally, to enlarge the effective receptive field (ERF) in the decoder, based on the grouping philosophy, we adopt dilated convolutions with multiple dilated rates to further enhance the performance. The superiority and efficiency of our proposed hybrid structure are demonstrated by outperforming state-of-the-art methods on the Vaihingen and Potsdam datasets with relatively lower complexity and fewer parameters.
Date of Conference: 07-12 July 2024
Date Added to IEEE Xplore: 05 September 2024
ISBN Information: