Abstract
As a fundamental technique for mining and analysis of remote sensing (RS) big data, content-based remote sensing image retrieval (CBRSIR) has received a lot of attention. Recently, cross-source CBRSIR (CS-CBRSIR) has become one of the most challenging tasks in the RS community. Due to the data drift issue, it is hard to find a proper similarity metric function to accurately measure similarities between the RS images from different sources. To address this issue, instead of directly using the manually designed similarity metrics, we propose an end-to-end similarity metric learning network, i.e., Siamese Transformer Network (STN) for CS-CBRSIR. Specifically, the proposed STN consists of three modules: (1) feature extraction module, which is a network combining Vision Transformer (ViT) with convolution layers, named as ConViT, (2) similarity metric function, which is a fully connected neural network (FCNN) aiming to compute the similarity between the output features from different sources, and (3) smooth average-precision (Smooth-AP) loss function, which measures the surrogate loss of standard AP metric to optimize the similarity metric function through backpropagation. Afterward, the learned similarity metric function can be adopted to implement the CS-CBRSIR accurately. Extensive experiments and ablation studies demonstrate that the proposed approach achieves promising performance in the CS-CBRSIR task.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
ViT-Pytorch toolkit is available at https://github.com/lucidrains/vit-pytorch.
PatternNet is available at https://sites.google.com/view/zhouwx/dataset.
FAIR1M is available at http://gaofen-challenge.com/indexpage.
DSRSID dataset is available from Baidu Cloud Storage at https://pan.baidu.com/s/15ZWaZ2yArnvwcwtead_rpQ.
References
Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fus 45:153–178
Yang C, Luo X, Lu J, Liu F (2018) Extracting hidden messages of mlsb steganography based on optimal stego subset. Sci China Inf Sci 61(11):1–3
Xu G, Wang Y-L, Gong Y (2019) The novel method with sequence sar imagery for ins/sar integrated navigation system, 1–4 . IEEE
Li Y, Ma J, Zhang Y (2021) Image retrieval from remote sensing big data: a survey. Inf Fus 67:94–115
Kumar M, Sarim M, Nemati A (2020) Autonomous navigation and target geo-location in gps denied environment, 153–175
Yu M, Yang C, Li Y (2018) Big data in natural disaster management: a review. Geosciences 8(5):165
Staniczenko PP, Sivasubramaniam P, Suttle KB, Pearson RG (2017) Linking macroecology and community ecology: refining predictions of species distributions using biotic interaction networks. Ecol Lett 20(6):693–707
Gómez Vargas, N (2020) Ensemble methods in supervised learning: review towards an application in a model for predictions about ecology
Li P, Ren P, Zhang X, Wang Q, Zhu X, Wang L (2018) Region-wise deep feature representation for remote sensing images. Remote Sens 10(6):871
Li Y, Zhang Y, Huang X, Zhu H, Ma J (2017) Large-scale remote sensing image retrieval by deep hashing neural networks. IEEE Trans Geosci Remote Sens 56(2):950–965
Xiong W, Lv Y, Cui Y, Zhang X, Gu X (2019) A discriminative feature learning approach for remote sensing image retrieval. Remote Sens 11(3):281
Imbriaco R, Sebastian C, Bondarev E, de With PH (2019) Aggregated deep local features for remote sensing image retrieval. Remote Sens 11(5):493
Zhou, Z., Gaurav, A., Gupta, B.B., Lytras, M.D., Razzak, I (2021) A fine-grained access control and security approach for intelligent vehicular transport in 6g communication system. IEEE Trans Intell Transp Syst
Hou R, Ai S, Chen Q, Yan H, Huang T, Chen K (2022) Similarity-based integrity protection for deep learning systems. Inf Sci 601:255–267
Othman E, Bazi Y, Melgani F, Alhichri H, Alajlan N, Zuair M (2017) Domain adaptation network for cross-scene classification. IEEE Trans Geosci Remote Sens 55(8):4441–4456
Zhou W, Newsam S, Li C, Shao Z (2017) Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote Sens 9(5):489
Ge Y, Jiang S, Xu Q, Jiang C, Ye F (2018) Exploiting representations from pre-trained convolutional neural networks for high-resolution remote sensing image retrieval. Multimed Tools Appl 77(13):17489–17515
Cao R, Zhang Q, Zhu J, Li Q, Li Q, Liu B, Qiu G (2020) Enhancing remote sensing image retrieval using a triplet deep metric learning network. Int J Remote Sens 41(2):740–751
Gupta, S., Hoffman, J., Malik, J (2016) Cross modal distillation for supervision transfer, 2827–2836
Zhou Z, Li Y, Zhang Y, Yin Z, Qi L, Ma R (2021) Residual visualization-guided explainable copy-relationship learning for image copy detection in social networks. Knowl-Based Syst 228:107287
Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2020) More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens 59(5):4340–4354
Zhao X, Tao R, Li W, Li H-C, Du Q, Liao W, Philips W (2020) Joint classification of hyperspectral and lidar data using hierarchical random walk and deep cnn architecture. IEEE Trans Geosci Remote Sens 58(10):7355–7370
Zhou, Z., Dong, X., Li, Z., Yu, K., Ding, C., Yang, Y.: Spatio-temporal feature encoding for traffic accident detection in vanet environment. IEEE Trans Intell Transp Syst (2022)
Hong D, Gao L, Yao J, Zhang B, Plaza A, Chanussot J (2020) Graph convolutional networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(7):5966–5978
Jiang, N., Jie, W., Li, J., Liu, X., Jin, D.: Gatrust: A multi-aspect graph attention network model for trust assessment in osns. IEEE Trans Knowl Data Eng (2022)
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers, pp 579–588
Azarang A, Kehtarnavaz N (2020) Image fusion in remote sensing by multi-objective deep learning. Int J Remote Sens 41(24):9507–9524
Cheng G, Xie X, Han J, Guo L, Xia G-S (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Sel Top Appl Earth Obs Remote Sens 13:3735–3756
Fu K, Chang Z, Zhang Y, Xu G, Zhang K, Sun X (2020) Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens 161:294–308
Wu M, Jin X, Jiang Q, Lee S-J, Liang W, Lin G, Yao S (2021) Remote sensing image colorization using symmetrical multi-scale dcgan in yuv color space. Vis Comput 37(7):1707–1729
Aptoula E (2013) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034
Zhou W, Newsam S, Li C, Shao Z (2018) Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogramm Remote Sens 145:197–209
Xiong W, Xiong Z, Cui Y, Lv Y (2020) A discriminative distillation network for cross-source remote sensing image retrieval. IEEE J Sel Top Appl Earth Obs Remote Sens 13:1234–1247
Xie, J., Fang, Y., Zhu, F., Wong, E (2015) Deepshape: Deep learned shape descriptor for 3d shape matching and retrieval, pp 1275–1283
Scott GJ, Klaric MN, Davis CH, Shyu C-R (2010) Entropy-balanced bitmap tree for shape-based object retrieval from large-scale satellite imagery databases. IEEE Trans Geosci Remote Sens 49(5):1603–1616
Liang C, Miao M, Ma J, Yan H, Zhang Q, Li X (2022) Detection of global positioning system spoofing attack on unmanned aerial vehicle system. Concurr Comput Pract Exp 34(7):5925
Zhu X, Shao Z (2011) Using no-parameter statistic features for texture image retrieval. Sens Rev
Lowe DG (1999) Object recognition from local scale-invariant features 2, 1150–1157 . IEEE
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359
Ke Y, Sukthankar R(2004) Pca-sift: a more distinctive representation for local image descriptors, vol 2,. IEEE
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H (2010) Large-scale image retrieval with compressed fisher vectors, pp 3384–3391. IEEE
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
Tombe R, Viriri S(2019) Local descriptors parameter characterization with fisher vectors for remote sensing images, pp 1–5 . IEEE
Huang L, Chen C, Li W, Du Q (2016) Remote sensing image scene classification using multi-scale completed local binary patterns and fisher vectors. Remote Sens 8(6):483
Jégou, H., Douze, M., Schmid, C., Pérez, P (2010) Aggregating local descriptors into a compact image representation, pp 3304–3311 . IEEE
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning, pp 4133–4141
Krizhevsky, A., Sutskever, I., Hinton, G.E(2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Han D, Liu Q, Fan W (2018) A new image classification method using CNN transfer learning and web data augmentation. Expert Syst Appl 95:43–56
Hussain M, Bird JJ, Faria DR (2018) A study on cnn transfer learning for image classification, 191–202 . Springer
Zhou W, Deng X, Shao Z (2018) Region convolutional features for multi-label remote sensing image retrieval. arXiv preprint arXiv:1807.08634
Li P, Han L, Tao X, Zhang X, Grecos C, Plaza A, Ren P (2020) Hashing nets for hashing: A quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(10):7331–7345
Chen Y, Lu X (2019) A deep hashing technique for remote sensing image-sound retrieval. Remote Sens 12(1):84
Liu C, Ma J, Tang X, Liu F, Zhang X, Jiao L (2020) Deep hash learning for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 59(4):3420–3443
Zhou Z, Li Y, Li J, Yu K, Kou G, Wang M, Gupta BB (2022) Gan-siamese network for cross-domain vehicle re-identification in intelligent transport systems. IEEE Trans Netw Sci Eng
Cohen, D., Mitra, B., Hofmann, K., Croft, W.B (2018) Cross domain regularization for neural ranking models using adversarial learning, pp 1025–1028
Wang H, Shen T, Zhang W, Duan L-Y, Mei T (2020) Classes matter: a fine-grained adversarial approach to cross-domain semantic segmentation. Springer, Berlin, pp 642–659
Benjdira B, Bazi Y, Koubaa A, Ouni K (2019) Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sens 11(11):1369
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, 2223–2232
Xiong W, Lv Y, Zhang X, Cui Y (2020) Learning to translate for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(7):4860–4874
Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:6000–6010
Hong D, Yokoya N, Chanussot J, Zhu XX (2018) An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans Image Process 28(4):1923–1938
Mohideen SK, Perumal SA, Sathik MM (2008) Image de-noising using discrete wavelet transform. Int J Comput Sci Netw Secur 8(1):213–216
Thakur RS, Chatterjee S, Yadav RN, Gupta L (2021) Image de-noising with machine learning: a review. IEEE Access 9:93338–93363
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge
Schroff F, Kalenichenko D, Philbin J ( 2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815– 823
Hoffer E, Ailon N ( 2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition . Springer, pp 84–92
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:12
Sun X, Wang P, Yan Z, Xu F, Wang R, Diao W, Chen J, Li J, Feng Y, Xu T et al (2022) Fair1m: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 184:116–130
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778
He K, Lu Y, Sclaroff S ( 2018) Local descriptors optimized for average precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 596– 605
He K, Cakir F, Bargal SA, Sclaroff S (2017) Hashing as tie-aware learning to rank. Methods 5(23):46
Cakir F, He K, Xia X, Kulis B, Sclaroff S .( 2019) Deep metric learning to rank. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1861–1870
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. \(\{\)TensorFlow\(\}\)( 2016) A system for \(\{\)Large-Scale\(\}\) machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265– 283
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H( 2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning, pp 10347– 10357 . PMLR
Chollet, F ( 2017)Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Liu W, Wen Y, Yu Z, Yang M ( 2016) Large-margin Softmax loss for convolutional neural networks. In: ICML, vol 2, p 7
Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM(2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207– 5216 ( 2019)
Koh PW, Liang P ( 2017). Understanding black-box predictions via influence functions. In: International conference on machine learning, pp 1885– 1894 PMLR
Brown A, Xie W, Kalogeiton V, Zisserman A ( 2020) Smooth-ap: Smoothing the path towards large-scale image retrieval. In: European conference on computer vision, pp 677– 694 . Springer
Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Zhang D, Li W-J ( 2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28
Mao G, Yuan Y, Xiaoqiang L ( 2018). Deep cross-modal retrieval for remote sensing image and audio. In: 2018 10th IAPR workshop on pattern recognition in remote sensing (PRRS), pp 1– 7 IEEE
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J ( 2017) Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 5380– 5389
Ye M, Lan X, Li J, Yuen P( 2018) Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Funding
This work is supported in part by the National Natural Science Foundation of China under Grant 61972205, Grant U1936218,Grant U20A20176, in part by the Guangdong Natural Science Funds for Distinguished Young Scholar, and in part by the Collaborative Innovation Center of Atmos-pheric Environment and Equipment Technology (CICAEET) fund, China.
Conflicts of interest
Author declares no conflicts of interest
Data availability statement
Data are available on request due to privacy or other restrictions
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, C., Wang, M., Zhou, Z. et al. Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval. Neural Comput & Applic 35, 8125–8142 (2023). https://doi.org/10.1007/s00521-022-08092-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-022-08092-6