Abstract:
Convolutional neural networks (CNNs) have been widely used with its powerful discriminative ability in change detection (CD), but most CNN-based methods are still explori...Show MoreMetadata
Abstract:
Convolutional neural networks (CNNs) have been widely used with its powerful discriminative ability in change detection (CD), but most CNN-based methods are still exploring ways to capture relatively long-range context in spatial-temporal domain. The recent vision transformer (ViT), which is based on the self-attention mechanism, has been applied in CD to model long-range dependencies. However, such transformer-based architectures do not fully consider the potential of interdependencies among the high-level semantic feature maps and easily overlook local detail features, resulting in a noncompact interior of the large-scale change area and missing small changes. Therefore, we propose a new transformer-based hybrid network called interact-feature transformer network with spatial detail enhancement module (IFTSDNet), which takes advantage of transformers to capture long-range context, and of CNNs to extract local information. We design an interact-feature transformer (IFT), which can not only obtain the global contextual information, but also achieve the interactions of high-level semantic feature maps. The spatial detail enhancement module (SDEM) with a group of various receptive fields is built to refine spatial features, which incorporates more discriminative feature representations. Comparative experiments prove the effectiveness of the proposed method, which shows better performance than four recent transformer-based methods. The code will be available at https://github.com/wanglinlin0219/IFTSDNet.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 20)