research-article

A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval

Authors:

Cong BaiAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 611 - 620

https://doi.org/10.1145/3581783.3612374

Published: 27 October 2023 Publication History

Abstract

This paper presents a prior instruction representation framework (PIR) for remote sensing image-text retrieval, aimed at remote sensing vision-language understanding tasks to solve the semantic noise problem. Our highlight is the proposal of a paradigm that draws on prior knowledge to instruct adaptive learning of vision and text representations. Concretely, two progressive attention encoder (PAE) structures, Spatial-PAE and Temporal-PAE, are proposed to perform long-range dependency modeling to enhance key feature representation. In vision representation, Vision Instruction Representation (VIR) based on Spatial-PAE exploits the prior-guided knowledge of the remote sensing scene recognition by building a belief matrix to select key features for reducing the impact of semantic noise. In text representation, Language Cycle Attention (LCA) based on Temporal-PAE uses the previous time step to cyclically activate the current time step to enhance text representation capability. A cluster-wise affiliation loss is proposed to constrain the inter-classes and to reduce the semantic confusion zones in the common subspace. Comprehensive experiments demonstrate that using prior knowledge instruction could enhance vision and text representations and could outperform the state-of-the-art methods on two benchmark datasets, RSICD and RSITMD. Codes are available at https://github.com/Zjut-MultimediaPlus/PIR-pytorch.

References

[1]

Taghreed Abdullah, Yakoub Bazi, Mohamad M Al Rahhal, Mohamed L Mekhalfi, Lalitha Rangarajan, and Mansour Zuair. 2020. TextRS: Deep bidirectional triplet network for matching text to remote sensing images. Remote Sensing, Vol. 12, 3 (2020), 405.

[2]

Cong Bai, Ling Huang, Xiang Pan, Jianwei Zheng, and Shengyong Chen. 2018. Optimization of deep convolutional neural network for large scale image retrieval. Neurocomputing, Vol. 303 (2018), 60--67.

[3]

Yue Cao, Jiarui Xu, Stephen Lin, Fangyun Wei, and Han Hu. 2019. Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In Proceedings of the IEEE/CVF international conference on computer vision workshops. 0--0.

[4]

Pu-Chin Chen, Henry Tsai, Srinadh Bhojanapalli, Hyung Won Chung, Yin-Wen Chang, and Chun-Sung Ferng. 2021. A simple and effective positional encoding for transformers. arXiv preprint arXiv:2104.08698 (2021).

[5]

Qimin Cheng, Yuzhuo Zhou, Peng Fu, Yuan Xu, and Liang Zhang. 2021. A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, Vol. 14 (2021), 4284--4297.

[6]

Mingmin Chi, Antonio Plaza, Jon Atli Benediktsson, Zhongyi Sun, Jinsheng Shen, and Yangyong Zhu. 2016. Big data for remote sensing: Challenges and opportunities. Proc. IEEE, Vol. 104, 11 (2016), 2207--2219.

[7]

Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).

Digital Library

[8]

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. 2019. Plug and play language models: A simple approach to controlled text generation. arXiv preprint arXiv:1912.02164 (2019).

[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[11]

Adji B Dieng, Chong Wang, Jianfeng Gao, and John Paisley. 2016. Topicrnn: A recurrent neural network with long-range semantic dependency. arXiv preprint arXiv:1611.01702 (2016).

[12]

Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, and Qikai Lu. 2019. Learning RoI transformer for oriented object detection in aerial images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2849--2858.

[13]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[14]

Fartash Faghri, David J Fleet, Jamie Ryan Kiros, and Sanja Fidler. 2017. Vse: Improving visual-semantic embeddings with hard negatives. arXiv preprint arXiv:1707.05612 (2017).

[15]

Duoduo Feng, Xiangteng He, and Yuxin Peng. 2023. MKVSE: Multimodal Knowledge Enhanced Visual-Semantic Embedding for Image-Text Retrieval. ACM Transactions on Multimedia Computing, Communications and Applications (2023).

[16]

Raia Hadsell, Sumit Chopra, and Yann LeCun. 2006. Dimensionality reduction by learning an invariant mapping. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), Vol. 2. IEEE, 1735--1742.

Digital Library

[17]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[18]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

[19]

Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, et al. 2021. Perceiver io: A general architecture for structured inputs & outputs. arXiv preprint arXiv:2107.14795 (2021).

[20]

Karen E Joyce, Stella E Belliss, Sergey V Samsonov, Stephen J McNeill, and Phil J Glassey. 2009. A review of the status of satellite remote sensing and image processing techniques for mapping natural hazards and disasters. Progress in physical geography, Vol. 33, 2 (2009), 183--207.

[21]

Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3128--3137.

[22]

Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C Berg, Wan-Yen Lo, et al. 2023. Segment anything. arXiv preprint arXiv:2304.02643 (2023).

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2017. Imagenet classification with deep convolutional neural networks. Commun. ACM, Vol. 60, 6 (2017), 84--90.

Digital Library

[24]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proc. IEEE, Vol. 86, 11 (1998), 2278--2324.

[25]

Kuang-Huei Lee, Xi Chen, Gang Hua, Houdong Hu, and Xiaodong He. 2018. Stacked cross attention for image-text matching. In Proceedings of the European conference on computer vision (ECCV). 201--216.

Digital Library

[26]

Chenliang Li, Weiran Xu, Si Li, and Sheng Gao. 2018. Guiding generation for abstractive text summarization based on key information guide network. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). 55--60.

[27]

Guozhong Li, Byron Choi, Jianliang Xu, Sourav S Bhowmick, Kwok-Pan Chun, and Grace Lai-Hung Wong. 2021. Shapenet: A shapelet-neural network approach for multivariate time series classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 8375--8383.

[28]

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, and Radu Timofte. 2021. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF international conference on computer vision. 1833--1844.

[29]

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Computer Vision-ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer, 740--755.

[30]

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012--10022.

[31]

Ilya Loshchilov and Frank Hutter. 2017. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101 (2017).

[32]

Xiaoqiang Lu, Binqiang Wang, Xiangtao Zheng, and Xuelong Li. 2017. Exploring models and data for remote sensing image caption generation. IEEE Transactions on Geoscience and Remote Sensing, Vol. 56, 4 (2017), 2183--2195.

[33]

Yafei Lv, Wei Xiong, Xiaohan Zhang, and Yaqi Cui. 2021. Fusion-based correlation learning model for cross-modal remote sensing image retrieval. IEEE Geoscience and Remote Sensing Letters, Vol. 19 (2021), 1--5.

[34]

Li Mi, Siran Li, Christel Chappuis, and Devis Tuia. 2022. Knowledge-Aware Cross-Modal Text-Image Retrieval for Remote Sensing Images. (2022).

[35]

Jiancheng Pan, Qing Ma, and Cong Bai. 2023. Reducing Semantic Confusion: Scene-aware Aggregation Network for Remote Sensing Cross-modal Retrieval. In Proceedings of the 2023 ACM International Conference on Multimedia Retrieval. 398--406.

Digital Library

[36]

Leigang Qu, Meng Liu, Da Cao, Liqiang Nie, and Qi Tian. 2020. Context-aware multi-view summarization network for image-text matching. In Proceedings of the 28th ACM International Conference on Multimedia. 1047--1055.

Digital Library

[37]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International conference on machine learning. PMLR, 8748--8763.

[38]

Jun Rao, Liang Ding, Shuhan Qi, Meng Fang, Yang Liu, Li Shen, and Dacheng Tao. 2023. Dynamic Contrastive Distillation for Image-Text Retrieval. IEEE Transactions on Multimedia (2023).

Digital Library

[39]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[40]

Chen Sun, Austin Myers, Carl Vondrick, Kevin Murphy, and Cordelia Schmid. 2019. Videobert: A joint model for video and language representation learning. In Proceedings of the IEEE/CVF international conference on computer vision. 7464--7473.

[41]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research, Vol. 9, 11 (2008).

[42]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).

[43]

Teng Wang, Jinrui Zhang, Junjie Fei, Yixiao Ge, Hao Zheng, Yunlong Tang, Zhe Li, Mingqi Gao, Shanshan Zhao, Ying Shan, et al. 2023. Caption anything: Interactive image description with diverse multimodal controls. arXiv preprint arXiv:2305.02677 (2023).

[44]

Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liangpei Zhang. 2018. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3974--3983.

[45]

Gui-Song Xia, Jingwen Hu, Fan Hu, Baoguang Shi, Xiang Bai, Yanfei Zhong, Liangpei Zhang, and Xiaoqiang Lu. 2017. AID: A benchmark data set for performance evaluation of aerial scene classification. IEEE Transactions on Geoscience and Remote Sensing, Vol. 55, 7 (2017), 3965--3981.

[46]

Xing Xu, Jialiang Sun, Zuo Cao, Yin Zhang, Xiaofeng Zhu, and Heng Tao Shen. 2023. TFUN: Trilinear Fusion Network for Ternary Image-Text Retrieval. Information Fusion, Vol. 91 (2023), 327--337.

Digital Library

[47]

Jinyu Yang, Mingqi Gao, Zhe Li, Shang Gao, Fangjing Wang, and Feng Zheng. 2023. Track anything: Segment anything meets videos. arXiv preprint arXiv:2304.11968 (2023).

[48]

Zhiqiang Yuan, Wenkai Zhang, Kun Fu, Xuan Li, Chubo Deng, Hongqi Wang, and Xian Sun. 2022a. Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval. arXiv preprint arXiv:2204.09868 (2022).

[49]

Zhiqiang Yuan, Wenkai Zhang, Xuee Rong, Xuan Li, Jialiang Chen, Hongqi Wang, Kun Fu, and Xian Sun. 2021. A lightweight multi-scale crossmodal text-image retrieval method in remote sensing. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2021), 1--19.

[50]

Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Yongqiang Mao, Ruixue Zhou, Hongqi Wang, Kun Fu, and Xian Sun. 2022b. MCRN: A Multi-source Cross-modal Retrieval Network for remote sensing. International Journal of Applied Earth Observation and Geoinformation, Vol. 115 (2022), 103071.

[51]

Zhiqiang Yuan, Wenkai Zhang, Changyuan Tian, Xuee Rong, Zhengyuan Zhang, Hongqi Wang, Kun Fu, and Xian Sun. 2022c. Remote Sensing Cross-Modal Text-Image Retrieval Based on Global and Local Information. IEEE Transactions on Geoscience and Remote Sensing, Vol. 60 (2022), 1--16.

[52]

Yan Zeng, Xinsong Zhang, and Hang Li. 2021. Multi-grained vision language pre-training: Aligning texts with visual concepts. arXiv preprint arXiv:2111.08276 (2021).

[53]

Huan Zhang, Yingzhi Sun, Yu Liao, SiYuan Xu, Rui Yang, Shuang Wang, Biao Hou, and Licheng Jiao. 2022. A Transformer-Based Cross-Modal Image-Text Retrieval Method using Feature Decoupling and Reconstruction. In IGARSS 2022--2022 IEEE International Geoscience and Remote Sensing Symposium. IEEE, 1796--1799.

[54]

Xiaoqing Zhang, Cong Bai, and Kidiyo Kpalma. 2023. OMCBIR: Offline mobile content-based image retrieval with lightweight CNN optimization. Displays, Vol. 76 (2023), 102355.

[55]

Qin Zou, Lihao Ni, Tong Zhang, and Qian Wang. 2015. Deep learning based feature selection for remote sensing scene classification. IEEE Geoscience and Remote Sensing Letters, Vol. 12, 11 (2015), 2321--2325.

Cited By

Zhou WWu HDeng P(2025)Toward Efficient and Accurate Remote Sensing Image–Text Retrieval With a Coarse-to-Fine ApproachIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2024.349454322(1-5)Online publication date: 2025
https://doi.org/10.1109/LGRS.2024.3494543
Yang RWang STao JHan YLin QGuo YHou BJiao LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Accurate and Lightweight Learning for Specific Domain Image-Text RetrievalProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681280(9719-9728)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681280
Ji ZMeng CZhang YWang HPang YHan JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681270(1662-1671)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681270
Show More Cited By

Index Terms

A Prior Instruction Representation Framework for Remote Sensing Image-text Retrieval
1. Information systems
  1. Information retrieval
    1. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Scale-Semantic Joint Decoupling Network for Image-Text Retrieval in Remote Sensing
Image-text retrieval in remote sensing aims to provide flexible information for data analysis and application. In recent years, state-of-the-art methods are dedicated to “scale decoupling” and “semantic decoupling” strategies to further enhance the ...
Entity Semantic Feature Fusion Network for Remote Sensing Image-Text Retrieval
Web and Big Data
Abstract
Recently, there has been remarkable progress in remote sensing image-text retrieval (RSITR), but in the past RSITR methods, researchers often try to extract features in images and texts from global and local perspectives, and the unique entity ...
Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit Reasoning
MM '24: Proceedings of the 32nd ACM International Conference on Multimedia

Mountains of researches center around the Remote Sensing Image-Text Retrieval (RSITR), aiming at retrieving the corresponding targets based on the given query. Among them, the transfer of Foundation Models (FMs), such as CLIP, to remote sensing domain ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Zhejiang Provincial Natural Science Foundation of China
Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
503
Total Downloads

Downloads (Last 12 months)323
Downloads (Last 6 weeks)21

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhou WWu HDeng P(2025)Toward Efficient and Accurate Remote Sensing Image–Text Retrieval With a Coarse-to-Fine ApproachIEEE Geoscience and Remote Sensing Letters10.1109/LGRS.2024.349454322(1-5)Online publication date: 2025
https://doi.org/10.1109/LGRS.2024.3494543
Yang RWang STao JHan YLin QGuo YHou BJiao LCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Accurate and Lightweight Learning for Specific Domain Image-Text RetrievalProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681280(9719-9728)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681280
Ji ZMeng CZhang YWang HPang YHan JCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Eliminate Before Align: A Remote Sensing Image-Text Retrieval Framework with Keyword Explicit ReasoningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681270(1662-1671)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681270
He XTang CLiu XLi CAn SLi ZCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Heterogeneous Graph Guided Contrastive Learning for Spatially Resolved Transcriptomics DataProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680941(8287-8295)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680941
Du YLiu YJin QCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Reversed in Time: A Novel Temporal-Emphasized Benchmark for Cross-Modal Video-Text RetrievalProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680731(5260-5269)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680731
Zhong SHao XYan YZhang YSong YLiang YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)UrbanCross: Enhancing Satellite Image-Text Retrieval with Cross-Domain AdaptationProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680604(6307-6315)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680604
Yang CLi ZZhang L(2024)MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed DescriptionIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.349797662(1-13)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3497976
Yang RWang SHan YLi YZhao DQuan DGuo YJiao LYang Z(2024)Transcending Fusion: A Multiscale Alignment Method for Remote Sensing Image–Text RetrievalIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.349689862(1-17)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3496898
Zhang ZZhao TGuo YYin J(2024)RS5M and GeoRSCLIP: A Large-Scale Vision- Language Dataset and a Large Vision-Language Model for Remote SensingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.344915462(1-23)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3449154
Liu FChen DGuan ZZhou XZhu JYe QFu LZhou J(2024)RemoteCLIP: A Vision Language Foundation Model for Remote SensingIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2024.339083862(1-16)Online publication date: 2024
https://doi.org/10.1109/TGRS.2024.3390838
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten