Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing

Peng, Jianting; Xiang, Xuyu; Qin, Jiaohua; Tan, Yun

doi:10.1007/s11227-024-06643-3

Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing

Published: 05 November 2024

Volume 81, article number 127, (2025)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Jianting Peng¹,
Xuyu Xiang¹,
Jiaohua Qin¹ &
…
Yun Tan¹

229 Accesses
Explore all metrics

Abstract

Cross-modal retrieval can break through the limitations of modalities and carry out information retrieval across data of different modalities to meet the needs of users in obtaining multi-modal correlation retrieval. Cloud computing has the advantages of high efficiency and low cost, but data security hinders its development. While cloud computing offers high efficiency and cost-effectiveness, concerns surrounding data security impede its full potential. Privacy-preserving cross-modal retrieval emerges as a viable solution, catering to users’ demands for efficient retrieval while safeguarding data confidentiality. However, a major challenge still exists in this field: how to bridge the inherent semantic gap within heterogeneous and chaotic information. To address this challenge, this paper proposes dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. Firstly, a dual-branch feature extraction network of encrypted image-text is constructed, enhancing the extraction of meaningful features from encrypted data. Secondly, a cross-modal alignment method is designed to eliminate the heterogeneous gap between different modalities through the alignment within and between modalities. Finally, to fully exploit the storage and computing advantages of cloud computing, both encrypted data and the cross-modal feature extractor are deployed to the cloud. Leveraging the dynamic update capabilities of cloud-stored encrypted data enables continuous model refinement, enhancing retrieval accuracy while reducing the storage and computational burdens on data owners. Extensive experiments conducted on the publicly available benchmark image-text dataset Wikipedia indicate that, compared to existing methods, our approach achieves improvements of 5.4%, 1%, 1.6%, and 20.1% in the four metrics of image-to-text (i2t), text-to-image (t2i), image-to-all (i2all), and text-to-all (t2all), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards efficient privacy-preserving encrypted image search in cloud computing

Article 20 November 2017

Secure and efficient image retrieval through invariant features selection in insecure cloud environments

Article 06 June 2021

PPOIM: Privacy-Preserving Shape Context Based Image Denoising and Matching with Efficient Outsourcing

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. Proceedings of the 25th ACM International Conference on Multimedia https://doi.org/10.1145/3123266.3123326
Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based micro-video background music recommendation. IEEE Trans Multimed 25:515–528. https://doi.org/10.1109/TMM.2021.3128254
Article Google Scholar
Xu X, Dong H, Qi L, Zhang X, Xiang H, Xia X, Xu Y, Dou W (2024) Cmclrec: Cross-modal contrastive learning for user cold-start sequential recommendation. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1589–1598
Wang F, Zhou Y, Wang S, Vardhanabhuti V, Yu L (2022) Multi-granularity cross-modal alignment for generalized medical visual representation learning. Adv Neural Inf Process Syst 35:33536–33549
Google Scholar
Zhan C, Zhang Y, Lin Y, Wang G, Wang H (2024) Unidcp: unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2024.3397191
Article Google Scholar
He Y, Xiang S, Kang C, Wang J, Pan C (2016) Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans Multimed 18:1363–1377. https://doi.org/10.1109/TMM.2016.2558463
Article Google Scholar
Hu S, Zhang LY, Wang Q, Qin Z, Wang C (2021) Towards private and scalable cross-media retrieval. IEEE Trans Dependable Secur Comput 18:1354–1368. https://doi.org/10.1109/TDSC.2019.2926968
Article Google Scholar
Cui J, Yu J, Shinde S, Saxena P, Cai Z (2021) Smashex: Smashing sgx enclaves using exceptions. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3460120.3484821
Zhu L, Song J, Yang Z, Huang W, Zhang C, Yu W (2021) Dap2cmh: deep adversarial privacy-preserving cross-modal hashing. Neural Process Lett 54:2549–2569. https://doi.org/10.1007/S11063-021-10447-4
Article Google Scholar
Sun X, Zhu Y, Xia Z, Chen L (2014) Privacy- preserving keyword-based semantic search over encrypted cloud data. Int J Secur Appl 8:9–20. https://doi.org/10.14257/ijsia.2014.8.3.02
Article Google Scholar
Fu Z, Sun X, Linge N, Zhou L (2014) Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans Consum Electron 60:164–172. https://doi.org/10.1109/TCE.2014.6780939
Article Google Scholar
Dai H, Dai X, Yi X, Yang G, Huang H (2019) Semantic-aware multi-keyword ranked search scheme over encrypted cloud data. J Netw Comput Appl 147:102442. https://doi.org/10.1016/j.jnca.2019.102442
Article Google Scholar
Liu Q, Peng Y, Pei S, Wu J, Peng T, Wang G (2020) Prime inner product encoding for effective wildcard-based multi-keyword fuzzy search. IEEE Trans Serv Comput 15(4):1799–1812. https://doi.org/10.1109/TSC.2020.3020688
Article Google Scholar
Wong WK, Cheung DW-L, Kao B, Mamoulis N (2009) Secure knn computation on encrypted databases. Proceedings of the 2009 ACM SIGMOD International Conference on Management of data https://doi.org/10.1109/TCE.2014.6780939
Zhou Q, Dai H, Hu Z, Liu Y, Yang G (2022) Sapms: A semantic-aware privacy-preserving multi-keyword search scheme in cloud. In: APWeb/WAIM. https://doi.org/10.1007/978-3-031-25158-0_20
Zhou Q, Dai H, Liu Y, Yang G, Yi X, Hu Z (2023) A novel semantic-aware search scheme based on bci-tree index over encrypted cloud data. World Wide Web 26:3055–3079. https://doi.org/10.1007/s11280-023-01176-w
Article Google Scholar
Anju J, Shreelekshmi R (2023) Pcbir-cv: a privacy-preserved content-based image retrieval using combined visual descriptors for cloud. Softw Impacts 17:100529. https://doi.org/10.1016/j.simpa.2023.100529
Article Google Scholar
Ma W, Zhou T, Qin J, Xiang X, Tan Y, Cai Z (2022) A privacy-preserving content-based image retrieval method based on deep learning in cloud computing. Expert Syst Appl 203:117508. https://doi.org/10.1016/j.eswa.2022.117508
Article Google Scholar
Wang Z, Qin J, Xiang X, Tan Y (2023) Privacy-preserving image retrieval based on disordered local histograms and vision transformer in cloud computing. Int J Intell Syst. https://doi.org/10.1155/2023/8931092
Article Google Scholar
Liu D, Shen J, Xia Z, Sun X (2017) A content-based image retrieval scheme using an encrypted difference histogram in cloud computing. Information 8:96. https://doi.org/10.3390/info8030096
Article Google Scholar
Song L, Miao Y, Weng J, Choo KR, Liu X, Deng RH (2022) Privacy-preserving threshold-based image retrieval in cloud-assisted internet of things. IEEE Internet Things J 9:13598–13611. https://doi.org/10.1109/jiot.2022.3142933
Article Google Scholar
Xia Z, Ji Q, Gu Q, Yuan C, Xiao F (2022) A format-compatible searchable encryption scheme for jpeg images using bag-of-words. ACM Trans Multimed Comput Commun Appl (TOMM) 18:1–18. https://doi.org/10.1145/3492705
Article Google Scholar
Cai G, Wei X, Li Y (2022) Privacy? Preserving cnn feature extraction and retrieval over medical images. Int J Intell Syst 37:9267–9289. https://doi.org/10.1002/int.22991
Article Google Scholar
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10386–10395 https://doi.org/10.1109/CVPR.2019.01064
Wang J, Gong TT, Zeng Z, Sun C, Yan Y (2022) C3cmr: Cross-modality cross-instance contrastive learning for cross-media retrieval. Proceedings of the 30th ACM International Conference on Multimedia https://doi.org/10.1145/3503161.3548263
Tu R-C, Jiang J, Lin Q, Cai C, Tian S, Wang H, Liu W (2023) Unsupervised cross-modal hashing with modality-interaction. IEEE Trans Circuits Syst Video Technol 33(9):5296–5308. https://doi.org/10.1109/TCSVT.2023.3251395
Article Google Scholar
Wang Z, Qin J, Xiang X, Tan Y, Peng J (2023) A privacy-preserving cross-media retrieval on encrypted data in cloud computing. J Inf Secur Appl 73:103440. https://doi.org/10.1016/j.jisa.2023.103440
Article Google Scholar
Zhang K, Xu S, Song Y, Xu Y, Li P, Yang X, Zou B, Wang W (2024) An efficient cross-modal privacy-preserving image-text retrieval scheme. Symmetry 16(8):1084. https://doi.org/10.3390/sym16081084
Article Google Scholar
Zhang P, Bai G, Yin H, Huang Z-L (2022) Proactive privacy-preserving learning for cross-modal retrieval. ACM Trans Inf Syst 41:1–23. https://doi.org/10.1145/3545799
Article Google Scholar
Weinberger KQ, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. In: International Conference on Machine Learning. https://doi.org/10.1145/1553374.1553516
Eberl M (2016) Fisher-yates shuffle. Arch. Formal. Proofs 2016:19
Bello I, Fedus W, Du X, Cubuk ED, Srinivas A, Lin T-Y, Shlens J, Zoph B (2021) Revisiting resnets: Improved training and scaling strategies. ArXiv arXiv:abs/2103.07579 https://doi.org/10.48550/arXiv.2103.07579
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv arXiv:abs/1905.11946 https://doi.org/10.48550/arXiv.1905.11946
Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning
Leng Z, Tan M, Liu C, Cubuk ED, Shi X, Cheng S, Anguelov D (2022) Polyloss: A polynomial expansion perspective of classification loss functions. ArXiv arXiv:abs/2204.12511 https://doi.org/10.48550/arXiv.2204.12511
Wen Y, Zhang K, LiZ, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision. https://doi.org/10.1007/978-3-319-46478-7_31
Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: Self-supervised learning via redundancy reduction. ArXiv arXiv:abs/2103.03230
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Proceedings of the 18th ACM International Conference on Multimedia https://doi.org/10.1145/1873951.1873987

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant (No. 62372478) and Changsha Municipal Natural Science Foundation (No. kq2402262).

Author information

Authors and Affiliations

College of Computer Science and Information Technology, Central South University of Forestry and Technology, Changsha, 410004, China
Jianting Peng, Xuyu Xiang, Jiaohua Qin & Yun Tan

Authors

Jianting Peng
View author publications
You can also search for this author inPubMed Google Scholar
Xuyu Xiang
View author publications
You can also search for this author inPubMed Google Scholar
Jiaohua Qin
View author publications
You can also search for this author inPubMed Google Scholar
Yun Tan
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Jianting Peng helped in conceptualization, methodology, and writing—original draft. Xuyu Xiang helped in conceptualization and writing—original draft, reviewing, and editing. Jiaohua Qin helped in conceptualization, writing—original draft, and funding acquisition. Yun Tan worked in supervision and funding acquisition.

Corresponding author

Correspondence to Xuyu Xiang.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical and informed consent

The data used in this paper are from the public dataset, which has been quoted in the paper. And there are no ethical issues with these data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Peng, J., Xiang, X., Qin, J. et al. Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. J Supercomput 81, 127 (2025). https://doi.org/10.1007/s11227-024-06643-3

Download citation

Accepted: 21 October 2024
Published: 05 November 2024
DOI: https://doi.org/10.1007/s11227-024-06643-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Towards efficient privacy-preserving encrypted image search in cloud computing

Secure and efficient image retrieval through invariant features selection in insecure cloud environments

PPOIM: Privacy-Preserving Shape Context Based Image Denoising and Matching with Efficient Outsourcing

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical and informed consent

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now