Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval

Ding, Chun; Wang, Meimin; Zhou, Zhili; Huang, Teng; Wang, Xiaoliang; Li, Jin

doi:10.1007/s00521-022-08092-6

Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval

Original Article
Published: 07 December 2022

Volume 35, pages 8125–8142, (2023)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Chun Ding²,
Meimin Wang²,
Zhili Zhou ORCID: orcid.org/0000-0002-5641-7169¹,
Teng Huang¹,
Xiaoliang Wang³ &
…
Jin Li¹

877 Accesses
1 Altmetric
Explore all metrics

Abstract

As a fundamental technique for mining and analysis of remote sensing (RS) big data, content-based remote sensing image retrieval (CBRSIR) has received a lot of attention. Recently, cross-source CBRSIR (CS-CBRSIR) has become one of the most challenging tasks in the RS community. Due to the data drift issue, it is hard to find a proper similarity metric function to accurately measure similarities between the RS images from different sources. To address this issue, instead of directly using the manually designed similarity metrics, we propose an end-to-end similarity metric learning network, i.e., Siamese Transformer Network (STN) for CS-CBRSIR. Specifically, the proposed STN consists of three modules: (1) feature extraction module, which is a network combining Vision Transformer (ViT) with convolution layers, named as ConViT, (2) similarity metric function, which is a fully connected neural network (FCNN) aiming to compute the similarity between the output features from different sources, and (3) smooth average-precision (Smooth-AP) loss function, which measures the surrogate loss of standard AP metric to optimize the similarity metric function through backpropagation. Afterward, the learned similarity metric function can be adopted to implement the CS-CBRSIR accurately. Extensive experiments and ablation studies demonstrate that the proposed approach achieves promising performance in the CS-CBRSIR task.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Incorporating Human Views into Unsupervised Deep Transfer Learning for Remote Sensing Image Retrieval

Semi-supervised Deep Metric GCN Based on Contrastive Learning for Remote Sensing Image Characterization

Semantics-Based Analysis for Multi-source Remote Sensing Image Retrieval

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Notes

https://github.com/Andrew-Brown1/Smooth_AP.
ViT-Pytorch toolkit is available at https://github.com/lucidrains/vit-pytorch.
PatternNet is available at https://sites.google.com/view/zhouwx/dataset.
FAIR1M is available at http://gaofen-challenge.com/indexpage.
DSRSID dataset is available from Baidu Cloud Storage at https://pan.baidu.com/s/15ZWaZ2yArnvwcwtead_rpQ.

References

Ma J, Ma Y, Li C (2019) Infrared and visible image fusion methods and applications: a survey. Inf Fus 45:153–178
Google Scholar
Yang C, Luo X, Lu J, Liu F (2018) Extracting hidden messages of mlsb steganography based on optimal stego subset. Sci China Inf Sci 61(11):1–3
Google Scholar
Xu G, Wang Y-L, Gong Y (2019) The novel method with sequence sar imagery for ins/sar integrated navigation system, 1–4 . IEEE
Li Y, Ma J, Zhang Y (2021) Image retrieval from remote sensing big data: a survey. Inf Fus 67:94–115
Google Scholar
Kumar M, Sarim M, Nemati A (2020) Autonomous navigation and target geo-location in gps denied environment, 153–175
Yu M, Yang C, Li Y (2018) Big data in natural disaster management: a review. Geosciences 8(5):165
Google Scholar
Staniczenko PP, Sivasubramaniam P, Suttle KB, Pearson RG (2017) Linking macroecology and community ecology: refining predictions of species distributions using biotic interaction networks. Ecol Lett 20(6):693–707
Google Scholar
Gómez Vargas, N (2020) Ensemble methods in supervised learning: review towards an application in a model for predictions about ecology
Li P, Ren P, Zhang X, Wang Q, Zhu X, Wang L (2018) Region-wise deep feature representation for remote sensing images. Remote Sens 10(6):871
Google Scholar
Li Y, Zhang Y, Huang X, Zhu H, Ma J (2017) Large-scale remote sensing image retrieval by deep hashing neural networks. IEEE Trans Geosci Remote Sens 56(2):950–965
Google Scholar
Xiong W, Lv Y, Cui Y, Zhang X, Gu X (2019) A discriminative feature learning approach for remote sensing image retrieval. Remote Sens 11(3):281
Google Scholar
Imbriaco R, Sebastian C, Bondarev E, de With PH (2019) Aggregated deep local features for remote sensing image retrieval. Remote Sens 11(5):493
Google Scholar
Zhou, Z., Gaurav, A., Gupta, B.B., Lytras, M.D., Razzak, I (2021) A fine-grained access control and security approach for intelligent vehicular transport in 6g communication system. IEEE Trans Intell Transp Syst
Hou R, Ai S, Chen Q, Yan H, Huang T, Chen K (2022) Similarity-based integrity protection for deep learning systems. Inf Sci 601:255–267
Google Scholar
Othman E, Bazi Y, Melgani F, Alhichri H, Alajlan N, Zuair M (2017) Domain adaptation network for cross-scene classification. IEEE Trans Geosci Remote Sens 55(8):4441–4456
Google Scholar
Zhou W, Newsam S, Li C, Shao Z (2017) Learning low dimensional convolutional neural networks for high-resolution remote sensing image retrieval. Remote Sens 9(5):489
Google Scholar
Ge Y, Jiang S, Xu Q, Jiang C, Ye F (2018) Exploiting representations from pre-trained convolutional neural networks for high-resolution remote sensing image retrieval. Multimed Tools Appl 77(13):17489–17515
Google Scholar
Cao R, Zhang Q, Zhu J, Li Q, Li Q, Liu B, Qiu G (2020) Enhancing remote sensing image retrieval using a triplet deep metric learning network. Int J Remote Sens 41(2):740–751
Google Scholar
Gupta, S., Hoffman, J., Malik, J (2016) Cross modal distillation for supervision transfer, 2827–2836
Zhou Z, Li Y, Zhang Y, Yin Z, Qi L, Ma R (2021) Residual visualization-guided explainable copy-relationship learning for image copy detection in social networks. Knowl-Based Syst 228:107287
Google Scholar
Hong D, Gao L, Yokoya N, Yao J, Chanussot J, Du Q, Zhang B (2020) More diverse means better: multimodal deep learning meets remote-sensing imagery classification. IEEE Trans Geosci Remote Sens 59(5):4340–4354
Google Scholar
Zhao X, Tao R, Li W, Li H-C, Du Q, Liao W, Philips W (2020) Joint classification of hyperspectral and lidar data using hierarchical random walk and deep cnn architecture. IEEE Trans Geosci Remote Sens 58(10):7355–7370
Google Scholar
Zhou, Z., Dong, X., Li, Z., Yu, K., Ding, C., Yang, Y.: Spatio-temporal feature encoding for traffic accident detection in vanet environment. IEEE Trans Intell Transp Syst (2022)
Hong D, Gao L, Yao J, Zhang B, Plaza A, Chanussot J (2020) Graph convolutional networks for hyperspectral image classification. IEEE Trans Geosci Remote Sens 59(7):5966–5978
Google Scholar
Jiang, N., Jie, W., Li, J., Liu, X., Jin, D.: Gatrust: A multi-aspect graph attention network model for trust assessment in osns. IEEE Trans Knowl Data Eng (2022)
Yuan K, Guo S, Liu Z, Zhou A, Yu F, Wu W (2021) Incorporating convolution designs into visual transformers, pp 579–588
Azarang A, Kehtarnavaz N (2020) Image fusion in remote sensing by multi-objective deep learning. Int J Remote Sens 41(24):9507–9524
Google Scholar
Cheng G, Xie X, Han J, Guo L, Xia G-S (2020) Remote sensing image scene classification meets deep learning: challenges, methods, benchmarks, and opportunities. IEEE J Sel Top Appl Earth Obs Remote Sens 13:3735–3756
Google Scholar
Fu K, Chang Z, Zhang Y, Xu G, Zhang K, Sun X (2020) Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images. ISPRS J Photogramm Remote Sens 161:294–308
Google Scholar
Wu M, Jin X, Jiang Q, Lee S-J, Liang W, Lin G, Yao S (2021) Remote sensing image colorization using symmetrical multi-scale dcgan in yuv color space. Vis Comput 37(7):1707–1729
Google Scholar
Aptoula E (2013) Remote sensing image retrieval with global morphological texture descriptors. IEEE Trans Geosci Remote Sens 52(5):3023–3034
Google Scholar
Zhou W, Newsam S, Li C, Shao Z (2018) Patternnet: A benchmark dataset for performance evaluation of remote sensing image retrieval. ISPRS J Photogramm Remote Sens 145:197–209
Google Scholar
Xiong W, Xiong Z, Cui Y, Lv Y (2020) A discriminative distillation network for cross-source remote sensing image retrieval. IEEE J Sel Top Appl Earth Obs Remote Sens 13:1234–1247
Google Scholar
Xie, J., Fang, Y., Zhu, F., Wong, E (2015) Deepshape: Deep learned shape descriptor for 3d shape matching and retrieval, pp 1275–1283
Scott GJ, Klaric MN, Davis CH, Shyu C-R (2010) Entropy-balanced bitmap tree for shape-based object retrieval from large-scale satellite imagery databases. IEEE Trans Geosci Remote Sens 49(5):1603–1616
Google Scholar
Liang C, Miao M, Ma J, Yan H, Zhang Q, Li X (2022) Detection of global positioning system spoofing attack on unmanned aerial vehicle system. Concurr Comput Pract Exp 34(7):5925
Google Scholar
Zhu X, Shao Z (2011) Using no-parameter statistic features for texture image retrieval. Sens Rev
Lowe DG (1999) Object recognition from local scale-invariant features 2, 1150–1157 . IEEE
Bay H, Ess A, Tuytelaars T, Van Gool L (2008) Speeded-up robust features (surf). Comput Vis Image Underst 110(3):346–359
Google Scholar
Ke Y, Sukthankar R(2004) Pca-sift: a more distinctive representation for local image descriptors, vol 2,. IEEE
Perronnin, F., Liu, Y., Sánchez, J., Poirier, H (2010) Large-scale image retrieval with compressed fisher vectors, pp 3384–3391. IEEE
Sánchez J, Perronnin F, Mensink T, Verbeek J (2013) Image classification with the fisher vector: theory and practice. Int J Comput Vis 105(3):222–245
MathSciNet MATH Google Scholar
Tombe R, Viriri S(2019) Local descriptors parameter characterization with fisher vectors for remote sensing images, pp 1–5 . IEEE
Huang L, Chen C, Li W, Du Q (2016) Remote sensing image scene classification using multi-scale completed local binary patterns and fisher vectors. Remote Sens 8(6):483
Google Scholar
Jégou, H., Douze, M., Schmid, C., Pérez, P (2010) Aggregating local descriptors into a compact image representation, pp 3304–3311 . IEEE
Yim J, Joo D, Bae J, Kim J (2017) A gift from knowledge distillation: fast optimization, network minimization and transfer learning, pp 4133–4141
Krizhevsky, A., Sutskever, I., Hinton, G.E(2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Shin H-C, Roth HR, Gao M, Lu L, Xu Z, Nogues I, Yao J, Mollura D, Summers RM (2016) Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans Med Imaging 35(5):1285–1298
Google Scholar
Han D, Liu Q, Fan W (2018) A new image classification method using CNN transfer learning and web data augmentation. Expert Syst Appl 95:43–56
Google Scholar
Hussain M, Bird JJ, Faria DR (2018) A study on cnn transfer learning for image classification, 191–202 . Springer
Zhou W, Deng X, Shao Z (2018) Region convolutional features for multi-label remote sensing image retrieval. arXiv preprint arXiv:1807.08634
Li P, Han L, Tao X, Zhang X, Grecos C, Plaza A, Ren P (2020) Hashing nets for hashing: A quantized deep learning to hash framework for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(10):7331–7345
Google Scholar
Chen Y, Lu X (2019) A deep hashing technique for remote sensing image-sound retrieval. Remote Sens 12(1):84
Google Scholar
Liu C, Ma J, Tang X, Liu F, Zhang X, Jiao L (2020) Deep hash learning for remote sensing image retrieval. IEEE Trans Geosci Remote Sens 59(4):3420–3443
Google Scholar
Zhou Z, Li Y, Li J, Yu K, Kou G, Wang M, Gupta BB (2022) Gan-siamese network for cross-domain vehicle re-identification in intelligent transport systems. IEEE Trans Netw Sci Eng
Cohen, D., Mitra, B., Hofmann, K., Croft, W.B (2018) Cross domain regularization for neural ranking models using adversarial learning, pp 1025–1028
Wang H, Shen T, Zhang W, Duan L-Y, Mei T (2020) Classes matter: a fine-grained adversarial approach to cross-domain semantic segmentation. Springer, Berlin, pp 642–659
Google Scholar
Benjdira B, Bazi Y, Koubaa A, Ouni K (2019) Unsupervised domain adaptation using generative adversarial networks for semantic segmentation of aerial images. Remote Sens 11(11):1369
Google Scholar
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks, 2223–2232
Xiong W, Lv Y, Zhang X, Cui Y (2020) Learning to translate for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 58(7):4860–4874
Google Scholar
Li Y, Zhang Y, Huang X, Ma J (2018) Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval. IEEE Trans Geosci Remote Sens 56(11):6521–6536
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30:6000–6010
Google Scholar
Hong D, Yokoya N, Chanussot J, Zhu XX (2018) An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans Image Process 28(4):1923–1938
MathSciNet Google Scholar
Mohideen SK, Perumal SA, Sathik MM (2008) Image de-noising using discrete wavelet transform. Int J Comput Sci Netw Secur 8(1):213–216
Google Scholar
Thakur RS, Chatterjee S, Yadav RN, Gupta L (2021) Image de-noising with machine learning: a review. IEEE Access 9:93338–93363
Google Scholar
Schütze H, Manning CD, Raghavan P (2008) Introduction to information retrieval, vol 39. Cambridge University Press, Cambridge
MATH Google Scholar
Schroff F, Kalenichenko D, Philbin J ( 2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815– 823
Hoffer E, Ailon N ( 2015) Deep metric learning using triplet network. In: International workshop on similarity-based pattern recognition . Springer, pp 84–92
Paszke A, Gross S, Massa F, Lerer A, Bradbury J, Chanan G, Killeen T, Lin Z, Gimelshein N, Antiga L et al (2019) Pytorch: an imperative style, high-performance deep learning library. Adv Neural Inf Process Syst 32:12
Google Scholar
Sun X, Wang P, Yan Z, Xu F, Wang R, Diao W, Chen J, Li J, Feng Y, Xu T et al (2022) Fair1m: a benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery. ISPRS J Photogramm Remote Sens 184:116–130
Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition, pp 770–778
He K, Lu Y, Sclaroff S ( 2018) Local descriptors optimized for average precision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 596– 605
He K, Cakir F, Bargal SA, Sclaroff S (2017) Hashing as tie-aware learning to rank. Methods 5(23):46
Google Scholar
Cakir F, He K, Xia X, Kulis B, Sclaroff S .( 2019) Deep metric learning to rank. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 1861–1870
Abadi M, Barham P, Chen J, Chen Z, Davis A, Dean J, Devin M, Ghemawat S, Irving G, Isard M, et al. $\{$TensorFlow$\}$( 2016) A system for $\{$Large-Scale$\}$ machine learning. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 265– 283
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H( 2021) Training data-efficient image transformers and distillation through attention. In: International conference on machine learning, pp 10347– 10357 . PMLR
Chollet, F ( 2017)Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258
Liu W, Wen Y, Yu Z, Yang M ( 2016) Large-margin Softmax loss for convolutional neural networks. In: ICML, vol 2, p 7
Wang X, Hua Y, Kodirov E, Hu G, Garnier R, Robertson NM(2019) Ranked list loss for deep metric learning. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5207– 5216 ( 2019)
Koh PW, Liang P ( 2017). Understanding black-box predictions via influence functions. In: International conference on machine learning, pp 1885– 1894 PMLR
Brown A, Xie W, Kalogeiton V, Zisserman A ( 2020) Smooth-ap: Smoothing the path towards large-scale image retrieval. In: European conference on computer vision, pp 677– 694 . Springer
Gong Y, Lazebnik S, Gordo A, Perronnin F (2012) Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell 35(12):2916–2929
Google Scholar
Zhang D, Li W-J ( 2014) Large-scale supervised multimodal hashing with semantic correlation maximization. In: Proceedings of the AAAI conference on artificial intelligence, vol 28
Mao G, Yuan Y, Xiaoqiang L ( 2018). Deep cross-modal retrieval for remote sensing image and audio. In: 2018 10th IAPR workshop on pattern recognition in remote sensing (PRRS), pp 1– 7 IEEE
Wu A, Zheng W-S, Yu H-X, Gong S, Lai J ( 2017) Rgb-infrared cross-modality person re-identification. In: Proceedings of the IEEE international conference on computer vision, pp 5380– 5389
Ye M, Lan X, Li J, Yuen P( 2018) Hierarchical discriminative learning for visible thermal person re-identification. In: Proceedings of the AAAI conference on artificial intelligence, vol 32

Download references

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Blockchain, Guangzhou University, Guangzhou, 510006, Guangdong, China
Zhili Zhou, Teng Huang & Jin Li
Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology, No.219, Ningliu Road, Nanjing, 210044, Jiangsu, China
Chun Ding & Meimin Wang
School of Computer Science and Engineering, Hunan University of Science and Technology, No.2, South Lushan Road, Xiangtan, 411201, Hunan, China
Xiaoliang Wang

Authors

Chun Ding
View author publications
You can also search for this author inPubMed Google Scholar
Meimin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Zhili Zhou
View author publications
You can also search for this author inPubMed Google Scholar
Teng Huang
View author publications
You can also search for this author inPubMed Google Scholar
Xiaoliang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Jin Li
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhili Zhou.

Ethics declarations

Funding

This work is supported in part by the National Natural Science Foundation of China under Grant 61972205, Grant U1936218,Grant U20A20176, in part by the Guangdong Natural Science Funds for Distinguished Young Scholar, and in part by the Collaborative Innovation Center of Atmos-pheric Environment and Equipment Technology (CICAEET) fund, China.

Conflicts of interest

Author declares no conflicts of interest

Data availability statement

Data are available on request due to privacy or other restrictions

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ding, C., Wang, M., Zhou, Z. et al. Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval. Neural Comput & Applic 35, 8125–8142 (2023). https://doi.org/10.1007/s00521-022-08092-6

Download citation

Received: 08 May 2022
Accepted: 22 November 2022
Published: 07 December 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s00521-022-08092-6

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Siamese transformer network-based similarity metric learning for cross-source remote sensing image retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Incorporating Human Views into Unsupervised Deep Transfer Learning for Remote Sensing Image Retrieval

Semi-supervised Deep Metric GCN Based on Contrastive Learning for Remote Sensing Image Characterization

Semantics-Based Analysis for Multi-source Remote Sensing Image Retrieval

Explore related subjects

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Funding

Conflicts of interest

Data availability statement

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now