Abstract
Cross-modal retrieval can break through the limitations of modalities and carry out information retrieval across data of different modalities to meet the needs of users in obtaining multi-modal correlation retrieval. Cloud computing has the advantages of high efficiency and low cost, but data security hinders its development. While cloud computing offers high efficiency and cost-effectiveness, concerns surrounding data security impede its full potential. Privacy-preserving cross-modal retrieval emerges as a viable solution, catering to users’ demands for efficient retrieval while safeguarding data confidentiality. However, a major challenge still exists in this field: how to bridge the inherent semantic gap within heterogeneous and chaotic information. To address this challenge, this paper proposes dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. Firstly, a dual-branch feature extraction network of encrypted image-text is constructed, enhancing the extraction of meaningful features from encrypted data. Secondly, a cross-modal alignment method is designed to eliminate the heterogeneous gap between different modalities through the alignment within and between modalities. Finally, to fully exploit the storage and computing advantages of cloud computing, both encrypted data and the cross-modal feature extractor are deployed to the cloud. Leveraging the dynamic update capabilities of cloud-stored encrypted data enables continuous model refinement, enhancing retrieval accuracy while reducing the storage and computational burdens on data owners. Extensive experiments conducted on the publicly available benchmark image-text dataset Wikipedia indicate that, compared to existing methods, our approach achieves improvements of 5.4%, 1%, 1.6%, and 20.1% in the four metrics of image-to-text (i2t), text-to-image (t2i), image-to-all (i2all), and text-to-all (t2all), respectively.





Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References
Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. Proceedings of the 25th ACM International Conference on Multimedia https://doi.org/10.1145/3123266.3123326
Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based micro-video background music recommendation. IEEE Trans Multimed 25:515–528. https://doi.org/10.1109/TMM.2021.3128254
Xu X, Dong H, Qi L, Zhang X, Xiang H, Xia X, Xu Y, Dou W (2024) Cmclrec: Cross-modal contrastive learning for user cold-start sequential recommendation. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1589–1598
Wang F, Zhou Y, Wang S, Vardhanabhuti V, Yu L (2022) Multi-granularity cross-modal alignment for generalized medical visual representation learning. Adv Neural Inf Process Syst 35:33536–33549
Zhan C, Zhang Y, Lin Y, Wang G, Wang H (2024) Unidcp: unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2024.3397191
He Y, Xiang S, Kang C, Wang J, Pan C (2016) Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans Multimed 18:1363–1377. https://doi.org/10.1109/TMM.2016.2558463
Hu S, Zhang LY, Wang Q, Qin Z, Wang C (2021) Towards private and scalable cross-media retrieval. IEEE Trans Dependable Secur Comput 18:1354–1368. https://doi.org/10.1109/TDSC.2019.2926968
Cui J, Yu J, Shinde S, Saxena P, Cai Z (2021) Smashex: Smashing sgx enclaves using exceptions. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3460120.3484821
Zhu L, Song J, Yang Z, Huang W, Zhang C, Yu W (2021) Dap2cmh: deep adversarial privacy-preserving cross-modal hashing. Neural Process Lett 54:2549–2569. https://doi.org/10.1007/S11063-021-10447-4
Sun X, Zhu Y, Xia Z, Chen L (2014) Privacy- preserving keyword-based semantic search over encrypted cloud data. Int J Secur Appl 8:9–20. https://doi.org/10.14257/ijsia.2014.8.3.02
Fu Z, Sun X, Linge N, Zhou L (2014) Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans Consum Electron 60:164–172. https://doi.org/10.1109/TCE.2014.6780939
Dai H, Dai X, Yi X, Yang G, Huang H (2019) Semantic-aware multi-keyword ranked search scheme over encrypted cloud data. J Netw Comput Appl 147:102442. https://doi.org/10.1016/j.jnca.2019.102442
Liu Q, Peng Y, Pei S, Wu J, Peng T, Wang G (2020) Prime inner product encoding for effective wildcard-based multi-keyword fuzzy search. IEEE Trans Serv Comput 15(4):1799–1812. https://doi.org/10.1109/TSC.2020.3020688
Wong WK, Cheung DW-L, Kao B, Mamoulis N (2009) Secure knn computation on encrypted databases. Proceedings of the 2009 ACM SIGMOD International Conference on Management of data https://doi.org/10.1109/TCE.2014.6780939
Zhou Q, Dai H, Hu Z, Liu Y, Yang G (2022) Sapms: A semantic-aware privacy-preserving multi-keyword search scheme in cloud. In: APWeb/WAIM. https://doi.org/10.1007/978-3-031-25158-0_20
Zhou Q, Dai H, Liu Y, Yang G, Yi X, Hu Z (2023) A novel semantic-aware search scheme based on bci-tree index over encrypted cloud data. World Wide Web 26:3055–3079. https://doi.org/10.1007/s11280-023-01176-w
Anju J, Shreelekshmi R (2023) Pcbir-cv: a privacy-preserved content-based image retrieval using combined visual descriptors for cloud. Softw Impacts 17:100529. https://doi.org/10.1016/j.simpa.2023.100529
Ma W, Zhou T, Qin J, Xiang X, Tan Y, Cai Z (2022) A privacy-preserving content-based image retrieval method based on deep learning in cloud computing. Expert Syst Appl 203:117508. https://doi.org/10.1016/j.eswa.2022.117508
Wang Z, Qin J, Xiang X, Tan Y (2023) Privacy-preserving image retrieval based on disordered local histograms and vision transformer in cloud computing. Int J Intell Syst. https://doi.org/10.1155/2023/8931092
Liu D, Shen J, Xia Z, Sun X (2017) A content-based image retrieval scheme using an encrypted difference histogram in cloud computing. Information 8:96. https://doi.org/10.3390/info8030096
Song L, Miao Y, Weng J, Choo KR, Liu X, Deng RH (2022) Privacy-preserving threshold-based image retrieval in cloud-assisted internet of things. IEEE Internet Things J 9:13598–13611. https://doi.org/10.1109/jiot.2022.3142933
Xia Z, Ji Q, Gu Q, Yuan C, Xiao F (2022) A format-compatible searchable encryption scheme for jpeg images using bag-of-words. ACM Trans Multimed Comput Commun Appl (TOMM) 18:1–18. https://doi.org/10.1145/3492705
Cai G, Wei X, Li Y (2022) Privacy? Preserving cnn feature extraction and retrieval over medical images. Int J Intell Syst 37:9267–9289. https://doi.org/10.1002/int.22991
Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10386–10395 https://doi.org/10.1109/CVPR.2019.01064
Wang J, Gong TT, Zeng Z, Sun C, Yan Y (2022) C3cmr: Cross-modality cross-instance contrastive learning for cross-media retrieval. Proceedings of the 30th ACM International Conference on Multimedia https://doi.org/10.1145/3503161.3548263
Tu R-C, Jiang J, Lin Q, Cai C, Tian S, Wang H, Liu W (2023) Unsupervised cross-modal hashing with modality-interaction. IEEE Trans Circuits Syst Video Technol 33(9):5296–5308. https://doi.org/10.1109/TCSVT.2023.3251395
Wang Z, Qin J, Xiang X, Tan Y, Peng J (2023) A privacy-preserving cross-media retrieval on encrypted data in cloud computing. J Inf Secur Appl 73:103440. https://doi.org/10.1016/j.jisa.2023.103440
Zhang K, Xu S, Song Y, Xu Y, Li P, Yang X, Zou B, Wang W (2024) An efficient cross-modal privacy-preserving image-text retrieval scheme. Symmetry 16(8):1084. https://doi.org/10.3390/sym16081084
Zhang P, Bai G, Yin H, Huang Z-L (2022) Proactive privacy-preserving learning for cross-modal retrieval. ACM Trans Inf Syst 41:1–23. https://doi.org/10.1145/3545799
Weinberger KQ, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. In: International Conference on Machine Learning. https://doi.org/10.1145/1553374.1553516
Eberl M (2016) Fisher-yates shuffle. Arch. Formal. Proofs 2016:19
Bello I, Fedus W, Du X, Cubuk ED, Srinivas A, Lin T-Y, Shlens J, Zoph B (2021) Revisiting resnets: Improved training and scaling strategies. ArXiv arXiv:abs/2103.07579https://doi.org/10.48550/arXiv.2103.07579
Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv arXiv:abs/1905.11946https://doi.org/10.48550/arXiv.1905.11946
Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning
Leng Z, Tan M, Liu C, Cubuk ED, Shi X, Cheng S, Anguelov D (2022) Polyloss: A polynomial expansion perspective of classification loss functions. ArXiv arXiv:abs/2204.12511https://doi.org/10.48550/arXiv.2204.12511
Wen Y, Zhang K, LiZ, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision. https://doi.org/10.1007/978-3-319-46478-7_31
Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: Self-supervised learning via redundancy reduction. ArXiv arXiv:abs/2103.03230
Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Proceedings of the 18th ACM International Conference on Multimedia https://doi.org/10.1145/1873951.1873987
Acknowledgements
This work is supported by the National Natural Science Foundation of China under Grant (No. 62372478) and Changsha Municipal Natural Science Foundation (No. kq2402262).
Author information
Authors and Affiliations
Contributions
Jianting Peng helped in conceptualization, methodology, and writing—original draft. Xuyu Xiang helped in conceptualization and writing—original draft, reviewing, and editing. Jiaohua Qin helped in conceptualization, writing—original draft, and funding acquisition. Yun Tan worked in supervision and funding acquisition.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflict of interest to declare that are relevant to the content of this article.
Ethical and informed consent
The data used in this paper are from the public dataset, which has been quoted in the paper. And there are no ethical issues with these data.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Peng, J., Xiang, X., Qin, J. et al. Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. J Supercomput 81, 127 (2025). https://doi.org/10.1007/s11227-024-06643-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06643-3