Abstract:
Hyperspectral images provide plentiful latent information that requires exploration for ground object recognition, where self-supervised learning (SSL) is efficient and i...Show MoreMetadata
Abstract:
Hyperspectral images provide plentiful latent information that requires exploration for ground object recognition, where self-supervised learning (SSL) is efficient and independent of manual labeling. However, the severe spectral uncertainty poses a significant challenge in discriminative and generalizable representation by self-supervision. This letter proposes a variational generative transformer (VGT) with momentum contrastive supervision (ConVaT) to alleviate the problem. ConVaT contains two branches: a variational generative branch and a contrastive learning branch—the former guides informative data representation via an encoder–decoder transformer with variational inference; the latter encourages the representation with discriminability by distinguishing positive anchors from negative ones. Significantly, to facilitate a more generalizable latent representation, we reconstruct data with reparameterized tokens sampled multiple times from the global anchor, instead of the latent representation of unmasking data. Extensive experiments on three public datasets show that ConVaT is superior in data representation with intraclass clustering and interclass distinction, and it achieves considerable improvements over present methods under linear probing, especially for the Indian pines (IP) dataset with intense spectral uncertainty. Our code will be available at https://github.com/liuzuo-byte/ConVaT.
Published in: IEEE Geoscience and Remote Sensing Letters ( Volume: 21)