Skip to main content
Log in

Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Cross-modal retrieval can break through the limitations of modalities and carry out information retrieval across data of different modalities to meet the needs of users in obtaining multi-modal correlation retrieval. Cloud computing has the advantages of high efficiency and low cost, but data security hinders its development. While cloud computing offers high efficiency and cost-effectiveness, concerns surrounding data security impede its full potential. Privacy-preserving cross-modal retrieval emerges as a viable solution, catering to users’ demands for efficient retrieval while safeguarding data confidentiality. However, a major challenge still exists in this field: how to bridge the inherent semantic gap within heterogeneous and chaotic information. To address this challenge, this paper proposes dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. Firstly, a dual-branch feature extraction network of encrypted image-text is constructed, enhancing the extraction of meaningful features from encrypted data. Secondly, a cross-modal alignment method is designed to eliminate the heterogeneous gap between different modalities through the alignment within and between modalities. Finally, to fully exploit the storage and computing advantages of cloud computing, both encrypted data and the cross-modal feature extractor are deployed to the cloud. Leveraging the dynamic update capabilities of cloud-stored encrypted data enables continuous model refinement, enhancing retrieval accuracy while reducing the storage and computational burdens on data owners. Extensive experiments conducted on the publicly available benchmark image-text dataset Wikipedia indicate that, compared to existing methods, our approach achieves improvements of 5.4%, 1%, 1.6%, and 20.1% in the four metrics of image-to-text (i2t), text-to-image (t2i), image-to-all (i2all), and text-to-all (t2all), respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Wang B, Yang Y, Xu X, Hanjalic A, Shen HT (2017) Adversarial cross-modal retrieval. Proceedings of the 25th ACM International Conference on Multimedia https://doi.org/10.1145/3123266.3123326

  2. Yi J, Zhu Y, Xie J, Chen Z (2021) Cross-modal variational auto-encoder for content-based micro-video background music recommendation. IEEE Trans Multimed 25:515–528. https://doi.org/10.1109/TMM.2021.3128254

    Article  Google Scholar 

  3. Xu X, Dong H, Qi L, Zhang X, Xiang H, Xia X, Xu Y, Dou W (2024) Cmclrec: Cross-modal contrastive learning for user cold-start sequential recommendation. In: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 1589–1598

  4. Wang F, Zhou Y, Wang S, Vardhanabhuti V, Yu L (2022) Multi-granularity cross-modal alignment for generalized medical visual representation learning. Adv Neural Inf Process Syst 35:33536–33549

    Google Scholar 

  5. Zhan C, Zhang Y, Lin Y, Wang G, Wang H (2024) Unidcp: unifying multiple medical vision-language tasks via dynamic cross-modal learnable prompts. IEEE Trans Multimed. https://doi.org/10.1109/TMM.2024.3397191

    Article  Google Scholar 

  6. He Y, Xiang S, Kang C, Wang J, Pan C (2016) Cross-modal retrieval via deep and bidirectional representation learning. IEEE Trans Multimed 18:1363–1377. https://doi.org/10.1109/TMM.2016.2558463

    Article  Google Scholar 

  7. Hu S, Zhang LY, Wang Q, Qin Z, Wang C (2021) Towards private and scalable cross-media retrieval. IEEE Trans Dependable Secur Comput 18:1354–1368. https://doi.org/10.1109/TDSC.2019.2926968

    Article  Google Scholar 

  8. Cui J, Yu J, Shinde S, Saxena P, Cai Z (2021) Smashex: Smashing sgx enclaves using exceptions. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security https://doi.org/10.1145/3460120.3484821

  9. Zhu L, Song J, Yang Z, Huang W, Zhang C, Yu W (2021) Dap2cmh: deep adversarial privacy-preserving cross-modal hashing. Neural Process Lett 54:2549–2569. https://doi.org/10.1007/S11063-021-10447-4

    Article  Google Scholar 

  10. Sun X, Zhu Y, Xia Z, Chen L (2014) Privacy- preserving keyword-based semantic search over encrypted cloud data. Int J Secur Appl 8:9–20. https://doi.org/10.14257/ijsia.2014.8.3.02

    Article  Google Scholar 

  11. Fu Z, Sun X, Linge N, Zhou L (2014) Achieving effective cloud search services: multi-keyword ranked search over encrypted cloud data supporting synonym query. IEEE Trans Consum Electron 60:164–172. https://doi.org/10.1109/TCE.2014.6780939

    Article  Google Scholar 

  12. Dai H, Dai X, Yi X, Yang G, Huang H (2019) Semantic-aware multi-keyword ranked search scheme over encrypted cloud data. J Netw Comput Appl 147:102442. https://doi.org/10.1016/j.jnca.2019.102442

    Article  Google Scholar 

  13. Liu Q, Peng Y, Pei S, Wu J, Peng T, Wang G (2020) Prime inner product encoding for effective wildcard-based multi-keyword fuzzy search. IEEE Trans Serv Comput 15(4):1799–1812. https://doi.org/10.1109/TSC.2020.3020688

    Article  Google Scholar 

  14. Wong WK, Cheung DW-L, Kao B, Mamoulis N (2009) Secure knn computation on encrypted databases. Proceedings of the 2009 ACM SIGMOD International Conference on Management of data https://doi.org/10.1109/TCE.2014.6780939

  15. Zhou Q, Dai H, Hu Z, Liu Y, Yang G (2022) Sapms: A semantic-aware privacy-preserving multi-keyword search scheme in cloud. In: APWeb/WAIM. https://doi.org/10.1007/978-3-031-25158-0_20

  16. Zhou Q, Dai H, Liu Y, Yang G, Yi X, Hu Z (2023) A novel semantic-aware search scheme based on bci-tree index over encrypted cloud data. World Wide Web 26:3055–3079. https://doi.org/10.1007/s11280-023-01176-w

    Article  Google Scholar 

  17. Anju J, Shreelekshmi R (2023) Pcbir-cv: a privacy-preserved content-based image retrieval using combined visual descriptors for cloud. Softw Impacts 17:100529. https://doi.org/10.1016/j.simpa.2023.100529

    Article  Google Scholar 

  18. Ma W, Zhou T, Qin J, Xiang X, Tan Y, Cai Z (2022) A privacy-preserving content-based image retrieval method based on deep learning in cloud computing. Expert Syst Appl 203:117508. https://doi.org/10.1016/j.eswa.2022.117508

    Article  Google Scholar 

  19. Wang Z, Qin J, Xiang X, Tan Y (2023) Privacy-preserving image retrieval based on disordered local histograms and vision transformer in cloud computing. Int J Intell Syst. https://doi.org/10.1155/2023/8931092

    Article  Google Scholar 

  20. Liu D, Shen J, Xia Z, Sun X (2017) A content-based image retrieval scheme using an encrypted difference histogram in cloud computing. Information 8:96. https://doi.org/10.3390/info8030096

    Article  Google Scholar 

  21. Song L, Miao Y, Weng J, Choo KR, Liu X, Deng RH (2022) Privacy-preserving threshold-based image retrieval in cloud-assisted internet of things. IEEE Internet Things J 9:13598–13611. https://doi.org/10.1109/jiot.2022.3142933

    Article  Google Scholar 

  22. Xia Z, Ji Q, Gu Q, Yuan C, Xiao F (2022) A format-compatible searchable encryption scheme for jpeg images using bag-of-words. ACM Trans Multimed Comput Commun Appl (TOMM) 18:1–18. https://doi.org/10.1145/3492705

    Article  Google Scholar 

  23. Cai G, Wei X, Li Y (2022) Privacy? Preserving cnn feature extraction and retrieval over medical images. Int J Intell Syst 37:9267–9289. https://doi.org/10.1002/int.22991

    Article  Google Scholar 

  24. Zhen L, Hu P, Wang X, Peng D (2019) Deep supervised cross-modal retrieval. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 10386–10395 https://doi.org/10.1109/CVPR.2019.01064

  25. Wang J, Gong TT, Zeng Z, Sun C, Yan Y (2022) C3cmr: Cross-modality cross-instance contrastive learning for cross-media retrieval. Proceedings of the 30th ACM International Conference on Multimedia https://doi.org/10.1145/3503161.3548263

  26. Tu R-C, Jiang J, Lin Q, Cai C, Tian S, Wang H, Liu W (2023) Unsupervised cross-modal hashing with modality-interaction. IEEE Trans Circuits Syst Video Technol 33(9):5296–5308. https://doi.org/10.1109/TCSVT.2023.3251395

    Article  Google Scholar 

  27. Wang Z, Qin J, Xiang X, Tan Y, Peng J (2023) A privacy-preserving cross-media retrieval on encrypted data in cloud computing. J Inf Secur Appl 73:103440. https://doi.org/10.1016/j.jisa.2023.103440

    Article  Google Scholar 

  28. Zhang K, Xu S, Song Y, Xu Y, Li P, Yang X, Zou B, Wang W (2024) An efficient cross-modal privacy-preserving image-text retrieval scheme. Symmetry 16(8):1084. https://doi.org/10.3390/sym16081084

    Article  Google Scholar 

  29. Zhang P, Bai G, Yin H, Huang Z-L (2022) Proactive privacy-preserving learning for cross-modal retrieval. ACM Trans Inf Syst 41:1–23. https://doi.org/10.1145/3545799

    Article  Google Scholar 

  30. Weinberger KQ, Dasgupta A, Attenberg J, Langford J, Smola A (2009) Feature hashing for large scale multitask learning. In: International Conference on Machine Learning. https://doi.org/10.1145/1553374.1553516

  31. Eberl M (2016) Fisher-yates shuffle. Arch. Formal. Proofs 2016:19

  32. Bello I, Fedus W, Du X, Cubuk ED, Srinivas A, Lin T-Y, Shlens J, Zoph B (2021) Revisiting resnets: Improved training and scaling strategies. ArXiv arXiv:abs/2103.07579https://doi.org/10.48550/arXiv.2103.07579

  33. Tan M, Le QV (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. ArXiv arXiv:abs/1905.11946https://doi.org/10.48550/arXiv.1905.11946

  34. Yang L, Zhang R-Y, Li L, Xie X (2021) Simam: A simple, parameter-free attention module for convolutional neural networks. In: International Conference on Machine Learning

  35. Leng Z, Tan M, Liu C, Cubuk ED, Shi X, Cheng S, Anguelov D (2022) Polyloss: A polynomial expansion perspective of classification loss functions. ArXiv arXiv:abs/2204.12511https://doi.org/10.48550/arXiv.2204.12511

  36. Wen Y, Zhang K, LiZ, Qiao Y (2016) A discriminative feature learning approach for deep face recognition. In: European Conference on Computer Vision. https://doi.org/10.1007/978-3-319-46478-7_31

  37. Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: Self-supervised learning via redundancy reduction. ArXiv arXiv:abs/2103.03230

  38. Rasiwasia N, Pereira JC, Coviello E, Doyle G, Lanckriet GRG, Levy R, Vasconcelos N (2010) A new approach to cross-modal multimedia retrieval. Proceedings of the 18th ACM International Conference on Multimedia https://doi.org/10.1145/1873951.1873987

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China under Grant (No. 62372478) and Changsha Municipal Natural Science Foundation (No. kq2402262).

Author information

Authors and Affiliations

Authors

Contributions

Jianting Peng helped in conceptualization, methodology, and writing—original draft. Xuyu Xiang helped in conceptualization and writing—original draft, reviewing, and editing. Jiaohua Qin helped in conceptualization, writing—original draft, and funding acquisition. Yun Tan worked in supervision and funding acquisition.

Corresponding author

Correspondence to Xuyu Xiang.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical and informed consent

The data used in this paper are from the public dataset, which has been quoted in the paper. And there are no ethical issues with these data.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, J., Xiang, X., Qin, J. et al. Dual-branch networks for privacy-preserving cross-modal retrieval in cloud computing. J Supercomput 81, 127 (2025). https://doi.org/10.1007/s11227-024-06643-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11227-024-06643-3

Keywords