Loading [MathJax]/extensions/TeX/ieee_stixext.js
Multi-View Feature Fusion and Visual Prompt for Remote Sensing Image Captioning | IEEE Journals & Magazine | IEEE Xplore
Scheduled Maintenance: On Monday, 27 January, the IEEE Xplore Author Profile management portal will undergo scheduled maintenance from 9:00-11:00 AM ET (1400-1600 UTC). During this time, access to the portal will be unavailable. We apologize for any inconvenience.

Multi-View Feature Fusion and Visual Prompt for Remote Sensing Image Captioning


Abstract:

Remote sensing image (RSI) captioning is a vision-language multimodal task concentrating on both image comprehension and sentence generation. Several studies suggest that...Show More

Abstract:

Remote sensing image (RSI) captioning is a vision-language multimodal task concentrating on both image comprehension and sentence generation. Several studies suggest that encoder–decoder-based methods have achieved success in RSI captioning. However, existing encoder–decoder-based methods may not fully explore image representations for RSI captioning and suffer from a lack of additional prompt information for sentence generation. In this article, a novel multi-view feature fusion and prompt (MVP)-based model is proposed to obtain better RSI representations and enhance language model performance in RSI captioning. Specifically, we design an attention-based feature fusion module to dynamically fuse multi-view visual features, which are extracted from the fine-tuned vision-language pretraining (VLP) model and the vision-task pretraining (VP) model. Then, a flexible visual prefix mapping module is proposed to transform images into visual prefixes, providing semantic information for the subsequent sentence generation. Finally, a BERT-based caption generator is applied to generate accurate descriptions based on the fused visual features and the visual prefixes, which are both outputs from our designed modules. Extensive experiments are conducted on three well-known benchmark datasets, demonstrating that our method achieves state-of-the-art (SOTA) performance. The relevant code is available at https://github.com/QiaoLing-Lin/MVP.
Article Sequence Number: 4708217
Date of Publication: 25 July 2024

ISSN Information:

Funding Agency:


References

References is not available for this document.