Journals & Magazines >IEEE Transactions on Geoscien... >Volume: 60

Convolution-Embedded Vision Transformer With Elastic Positional Encoding for Pansharpening

Download PDF
Download References
Request Permissions
Save to
Alerts

Abstract:

Transformer, especially vision transformer (ViT), is attracting increasing attention in various computer vision (CV) tasks. However, two urgent problems exist for the ViT...Show More

Metadata

Abstract:

Transformer, especially vision transformer (ViT), is attracting increasing attention in various computer vision (CV) tasks. However, two urgent problems exist for the ViT: 1) owing to its attending to an image in the patch level, the ViT seems to have a better performance in fetching global representations but is limited in extracting local features, which is an inherent advantage for the convolutional neural network (CNN) and 2) the learnable positional encoding plays a positive role, but limits the cross-resolution ability of the network. Specifically, the pre-trained model could only generate images of the same size during training. To conquer the two problems, we propose a novel convolution-embedded ViT with elastic positional encoding in this article. On one hand, we propose a joint CNN and self-attention (CSA) network to collaboratively extract local and global features. On the other hand, we propose to integrate the elastic CNN-based positional encoder into the framework to solve the rigid limitation of the ViT in cross resolution issues and improve the performance. Extensive experiments were conducted on IKONOS and WorldView-2 with 4- and 8-band multispectral (MS) images, respectively. The visual and numerical results show the competitive performance of the proposed method.

Published in: IEEE Transactions on Geoscience and Remote Sensing ( Volume: 60)

Article Sequence Number: 5413809

Date of Publication: 07 December 2022

ISSN Information:

DOI: 10.1109/TGRS.2022.3227405

Funding Agency:

Contents

References is not available for this document.

Convolution-Embedded Vision Transformer With Elastic Positional Encoding for Pansharpening

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

References

IEEE Account

Purchase Details

Profile Information

Need Help?

Convolution-Embedded Vision Transformer With Elastic Positional Encoding for Pansharpening

Alerts

Abstract:

Metadata

Abstract:

ISSN Information:

Funding Agency:

Authors

Figures

References

Citations

Keywords

Metrics

References

IEEE Account

Purchase Details

Profile Information

Need Help?