skip to main content
10.1145/3581783.3613837acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing

Published: 27 October 2023 Publication History

Abstract

Although deep cross-modal hashing methods have shown superiorities for cross-modal retrieval recently, there is a concern about potential data privacy leakage when training the models. Federated learning adopts a distributed machine learning strategy, which can collaboratively train models without leaking local private data. It is a promising technique to support privacy-preserving cross-modal hashing. However, existing federated learning-based cross-modal retrieval methods usually rely on a large number of semantic annotations, which limits the scalability of the retrieval models. Furthermore, they mostly update the global models by aggregating local model parameters, ignoring the differences in the quantity and category of multi-modal data from multiple clients. To address these issues, we propose a Prototype Transfer-based Federated Unsupervised Cross-modal Hashing(PT-FUCH) method for solving the privacy leakage problem in cross-modal retrieval model learning. PT-FUCH protects local private data by exploring unified global prototypes for different clients, without relying on any semantic annotations. Global prototypes are used to guide the local cross-modal hash learning and promote the alignment of the feature space, thereby alleviating the model bias caused by the difference in the distribution of local multi-modal data and improving the retrieval accuracy. Additionally, we design an adaptive cross-modal knowledge distillation to transfer valuable semantic knowledge from modal-specific global models to local prototype learning processes, reducing the risk of overfitting. Experimental results on three benchmark cross-modal retrieval datasets validate that our PT-FUCH method can achieve outstanding retrieval performance when trained under distributed privacy-preserving mode. The source codes of our method are available at https://github.com/exquisite1210/PT-FUCH_P.

Supplemental Material

MP4 File
Presentation video

References

[1]
Prashant Bhat, Elahe Arani, and Bahram Zonooz. 2021. Distill on the go: online knowledge distillation in self-supervised learning. In CVPR. 2678--2687.
[2]
Keith Bonawitz, Vladimir Ivanov, Ben Kreuter, Antonio Marcedone, H Brendan McMahan, Sarvar Patel, Daniel Ramage, Aaron Segal, and Karn Seth. 2017. Practical secure aggregation for privacy-preserving machine learning. In SIGSAC. 1175--1191.
[3]
Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531 (2014).
[4]
Brian Chen, Andrew Rouditchenko, Kevin Duarte, Hilde Kuehne, Samuel Thomas, Angie W. Boggust, Rameswar Panda, Brian Kingsbury, Rogério Feris, David Harwath, James R. Glass, Michael Picheny, and Shih-Fu Chang. 2021. Multimodal Clustering Networks for Self-supervised Learning from Unlabeled Videos. In ICCV. 7992--8001.
[5]
Jiayi Chen and Aidong Zhang. 2022. FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks. In KDD. 87--96.
[6]
Sijia Chen and Baochun Li. 2022. Towards Optimal Multi-Modal Federated Learning on Non-IID Data with Hierarchical Gradient Blending. In INFOCOM. 1469--1478.
[7]
Yiqiang Chen, Xin Qin, Jindong Wang, Chaohui Yu, and Wen Gao. 2020. Fed- Health: A Federated Transfer Learning Framework for Wearable Healthcare. IS 35 (2020), 83--93.
[8]
Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. Nus-wide: a real-world web image database from national university of singapore. In CIVR. 1--9.
[9]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[10]
Guiguang Ding, Yuchen Guo, and Jile Zhou. 2014. Collective Matrix Factorization Hashing for Multimodal Data. In CVPR. 2083--2090.
[11]
Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach. In NeurIPS. 3557--3568.
[12]
Sungwon Han, Sungwon Park, Fangzhao Wu, Sundong Kim, Chuhan Wu, Xing Xie, and Meeyoung Cha. 2022. FedX: Unsupervised Federated Learning with Cross Knowledge Distillation. In ECCV. 691--707.
[13]
Peng Hu, Hongyuan Zhu, Jie Lin, Dezhong Peng, Yin-Ping Zhao, and Xi Peng. 2023. Unsupervised Contrastive Cross-Modal Hashing. TPAMI 45 (2023), 3877-- 3889.
[14]
Yutao Huang, Lingyang Chu, Zirui Zhou, Lanjun Wang, Jiangchuan Liu, Jian Pei, and Yong Zhang. 2021. Personalized Cross-Silo Federated Learning on Non-IID Data. In AAAI. 7865--7873.
[15]
Mark J. Huiskes and Michael S. Lew. 2008. The MIR flickr retrieval evaluation. In ACM MM. 39--43.
[16]
Qing-Yuan Jiang and Wu-Jun Li. 2017. Deep Cross-Modal Hashing. In CVPR. 3270--3278.
[17]
Yihan Jiang, Jakub Konečný, Keith Rush, and Sreeram Kannan. 2019. Improving Federated Learning Personalization via Model Agnostic Meta Learning. arXiv preprint arXiv:1909.12488 (2019).
[18]
Nazmul Karim, Mamshad Nayeem Rizve, Nazanin Rahnavard, Ajmal Mian, and Mubarak Shah. 2022. Unicon: Combating label noise through uniform selection and contrastive learning. In CVPR. 9676--9686.
[19]
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, and Ananda Theertha Suresh. 2020. SCAFFOLD: Stochastic Controlled Averaging for Federated Learning. In ICML. 5132--5143.
[20]
Parminder Kaur, Husanbir Singh Pannu, and Avleen Kaur Malhi. 2021. Comparative analysis on cross-modal information retrieval: a review. CSR 39 (2021), 100336.
[21]
Qinbin Li, Yiqun Diao, Quan Chen, and Bingsheng He. 2022. Federated Learning on Non-IID Data Silos: An Experimental Study. In ICDE. 965--978.
[22]
Qinbin Li, Bingsheng He, and Dawn Song. 2021. Model-Contrastive Federated Learning. In CVPR. 10713--10722.
[23]
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated Optimization in Heterogeneous Networks. In MLSys. 429--450.
[24]
Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV. 740--755.
[25]
Zijia Lin, Guiguang Ding, Mingqing Hu, and Jianmin Wang. 2015. Semanticspreserving hashing for cross-view retrieval. In CVPR. 3864--3872.
[26]
Hong Liu, Rongrong Ji, Yongjian Wu, Feiyue Huang, and Baochang Zhang. 2017. Cross-Modality Binary Code Learning via Fusion Similarity Hashing. In CVPR. 6345--6353.
[27]
Jiale Liu, Yu-Wei Zhan, Xin Luo, Zhen-Duo Chen, Yongxin Wang, and Xin-Shun Xu. 2022. Prototype-Based Layered Federated Cross-Modal Hashing. arXiv preprint arXiv:2210.15678 (2022).
[28]
Wei Liu, Cun Mu, Sanjiv Kumar, and Shih-Fu Chang. 2014. Discrete graph hashing. NeurIPS 27.
[29]
Yang Liu, Qingchao Chen, and Samuel Albanie. 2021. Adaptive Cross-Modal Prototypes for Cross-Domain Visual-Language Retrieval. In CVPR. 14954--14964.
[30]
Yi Liu, Song Guo, Jie Zhang, Qihua Zhou, Yingchun Wang, and Xiaohan Zhao. 2022. Feature Correlation-guided Knowledge Transfer for Federated Selfsupervised Learning. arXiv preprint arXiv:2211.07364 (2022).
[31]
Yijing Liu, Dongming Han, Jianwei Zhang, Haiyang Zhu, Mingliang Xu, and Wei Chen. 2022. Federated Multi-task Graph Learning. TIST 13, 5 (2022), 80:1--80:27.
[32]
Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, and Akhil Mathur. 2022. Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering. In ICML. 14461--14484.
[33]
Mi Luo, Fei Chen, Dapeng Hu, Yifan Zhang, Jian Liang, and Jiashi Feng. 2021. No fear of heterogeneity: Classifier calibration for federated learning with non-iid data. NeurIPS, 5972--5984.
[34]
Zeyu Ma, Wei Ju, Xiao Luo, Chong Chen, Xian-Sheng Hua, and Guangming Lu. 2022. Improved Deep Unsupervised Hashing via Prototypical Learning. In ACM MM. 659--667.
[35]
Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. 2017. Communication-Efficient Learning of Deep Networks from Decentralized Data. In AISTATS. 1273--1282.
[36]
Khalil Muhammad, Qinqin Wang, Diarmuid O'Reilly-Morgan, Elias Z. Tragos, Barry Smyth, Neil Hurley, James Geraci, and Aonghus Lawlor. 2020. FedFast: Going Beyond Average for Faster Training of Federated Recommender Systems. In KDD. 1234--1242.
[37]
Chaoyue Niu, Fan Wu, Shaojie Tang, Lifeng Hua, Rongfei Jia, Chengfei Lv, Zhihua Wu, and Guihai Chen. 2020. Billion-scale federated learning on mobile clients: a submodel design with tunable privacy. In MobiCom. 31:1--31:14.
[38]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. IJCV, 211--252.
[39]
Micah J. Sheller, G. Anthony Reina, Brandon Edwards, Jason Martin, and Spyridon Bakas. 2018. Multi-institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation. In MICCAI, Vol. 11383. 92--104.
[40]
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet Talwalkar. 2017. Federated Multi-Task Learning. In NeurIPS. 4424--4434.
[41]
Shupeng Su, Zhisheng Zhong, and Chao Zhang. 2019. Deep Joint-Semantics Reconstructing Hashing for Large-Scale Unsupervised Cross-Modal Retrieval. In ICCV. 3027--3035.
[42]
Wentao Tan, Lei Zhu, Jingjing Li, Huaxiang Zhang, and Junwei Han. 2022. Teacher-Student Learning: Efficient Hierarchical Message Aggregation Hashing for Cross-Modal Retrieval. TMM (2022).
[43]
Yue Tan, Guodong Long, Jie Ma, Lu Liu, Tianyi Zhou, and Jing Jiang. 2022. Federated Learning from Pre-Trained Models: A Contrastive Learning Approach. arXiv preprint arXiv:2209.10083 (2022).
[44]
Jun Tang, Ke Wang, and Ling Shao. 2016. Supervised Matrix Factorization Hashing for Cross-Modal Retrieval. TIP 25 (2016), 3157--3166.
[45]
Junfeng Tu, Xueliang Liu, Zongxiang Lin, Richang Hong, and Meng Wang. 2022. Differentiable Cross-modal Hashing via Multimodal Transformers. In ACM MM. 453--461.
[46]
Aäron van den Oord, Yazhe Li, and Oriol Vinyals. 2018. Representation Learning with Contrastive Predictive Coding. arXiv preprint arXiv:1807.03748 (2018).
[47]
Jun Wang, Abhir Bhalerao, and Yulan He. 2022. Cross-Modal Prototype Driven Network for Radiology Report Generation. In ECCV. 563--579.
[48]
Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and H Vincent Poor. 2020. Tackling the objective inconsistency problem in heterogeneous federated optimization. NeurIPS, 7611--7623.
[49]
Lu Wang, Jie Yang, Masoumeh Zareapoor, and Zhonglong Zheng. 2021. Clusterwise unsupervised hashing for cross-modal similarity search. PR 111 (2021), 107732.
[50]
Shuyi Wang and Guido Zuccon. 2022. Is Non-IID Data a Threat in Federated Online Learning to Rank?. In SIGIR. 2801--2813.
[51]
Stefanie Warnat-Herresthal, Hartmut Schultze, Krishnaprasad Lingadahalli Shastry, Sathyanarayanan Manamohan, Saikat Mukherjee, Vishesh Garg, Ravi Sarveswara, Kristian Händler, Peter Pickkers, N Ahmad Aziz, et al. 2021. Swarm learning for decentralized and confidential clinical machine learning. Nature 594 (2021), 265--270.
[52]
Chuhan Wu, Fangzhao Wu, Lingjuan Lyu, Yongfeng Huang, and Xing Xie. 2022. Communication-efficient federated learning via knowledge distillation. Nature communications 13 (2022), 2032.
[53]
Runhua Xu, Nathalie Baracaldo, Yi Zhou, Ali Anwar, James Joshi, and Heiko Ludwig. 2021. FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data. In AISec@CCS. 181--192.
[54]
Yang Xu, Lei Zhu, Zhiyong Cheng, Jingjing Li, Zheng Zhang, and Huaxiang Zhang. 2023. Multi-Modal Discrete Collaborative Filtering for Efficient Cold- Start Recommendation. TKDE 35 (2023), 741--755.
[55]
Chuanguang Yang, Zhulin An, Linhang Cai, and Yongjun Xu. 2022. Mutual contrastive learning for visual representation learning. In AAAI. 3045--3053.
[56]
Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. 2019. Federated Machine Learning: Concept and Applications. TIST 10 (2019), 12:1--12:19.
[57]
Jun Yu, Hao Zhou, Yibing Zhan, and Dacheng Tao. 2021. Deep Graph-neighbor Coherence Preserving Network for Unsupervised Cross-modal Hashing. In AAAI. 4626--4634.
[58]
Fengda Zhang, Kun Kuang, Zhaoyang You, Tao Shen, Jun Xiao, Yin Zhang, Chao Wu, Yueting Zhuang, and Xiaolin Li. 2020. Federated Unsupervised Representation Learning. arXiv preprint arXiv:2010.08982 (2020).
[59]
Jie Zhang, Zhiqi Li, Bo Li, Jianghe Xu, ShuangWu, Shouhong Ding, and Chao Wu. 2022. Federated Learning with Label Distribution Skew via Logits Calibration. In ICML. 26311--26329.
[60]
Lin Zhang, Li Shen, Liang Ding, Dacheng Tao, and Ling-Yu Duan. 2022. Finetuning global model via data-free knowledge distillation for non-iid federated learning. In CVPR. 10174--10183.
[61]
Nanxuan Zhao, Zhirong Wu, Rynson WH Lau, and Stephen Lin. 2020. What makes instance discrimination good for transfer learning? arXiv preprint arXiv:2006.06606 (2020).
[62]
Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon Civin, and Vikas Chandra. 2018. Federated Learning with Non-IID Data. arXiv preprint arXiv:1806.00582 (2018).
[63]
Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, and Yuxing Peng. 2022. Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast. In IJCAI. 3787--3794.
[64]
Lei Zhu, Xu Lu, Zhiyong Cheng, Jingjing Li, and Huaxiang Zhang. 2020. Deep collaborative multi-view hashing for large-scale image search. TIP 29 (2020), 4643--4655.
[65]
Lei Zhu, Xize Wu, Jingjing Li, Zheng Zhang, Weili Guan, and Heng Tao Shen. 2022. Work Together: Correlation-Identity Reconstruction Hashing for Unsupervised Cross-modal Retrieval. TKDE (2022).
[66]
Weiming Zhuang, Yonggang Wen, and Shuai Zhang. 2021. Joint Optimization in Edge-Cloud Continuum for Federated Unsupervised Person Re-identification. In ACM MM. 433--441.
[67]
Weiming Zhuang, Yonggang Wen, Xuesen Zhang, Xin Gan, Daiying Yin, Dongzhan Zhou, Shuai Zhang, and Shuai Yi. 2020. Performance optimization of federated person re-identification via benchmark analysis. In ACM MM. 955--963.
[68]
Linlin Zong, Qiujie Xie, Jiahui Zhou, Peiran Wu, Xianchao Zhang, and Bo Xu. 2021. FedCMR: Federated Cross-Modal Retrieval. In SIGIR. 1672--1676.

Cited By

View all
  • (2025)Enhancing semantic audio-visual representation learning with supervised multi-scale attentionPattern Analysis and Applications10.1007/s10044-025-01414-z28:2Online publication date: 11-Feb-2025
  • (2024)Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367450720:9(1-19)Online publication date: 23-Sep-2024
  • (2024)FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681319(9670-9679)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Prototype-guided Knowledge Transfer for Federated Unsupervised Cross-modal Hashing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. cross-modal retrieval
    2. federated learning
    3. prototype learning
    4. unsupervised cross-modal hashing
    5. unsupervised learning

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)418
    • Downloads (Last 6 weeks)39
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Enhancing semantic audio-visual representation learning with supervised multi-scale attentionPattern Analysis and Applications10.1007/s10044-025-01414-z28:2Online publication date: 11-Feb-2025
    • (2024)Privacy-Enhanced Prototype-Based Federated Cross-Modal Hashing for Cross-Modal RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/367450720:9(1-19)Online publication date: 23-Sep-2024
    • (2024)FedCAFE: Federated Cross-Modal Hashing with Adaptive Feature EnhancementProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681319(9670-9679)Online publication date: 28-Oct-2024
    • (2024)Deep Hierarchy-Aware Proxy Hashing With Self-Paced Learning for Cross-Modal RetrievalIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.340105036:11(5926-5939)Online publication date: 1-Nov-2024
    • (2024)RoMo: Robust Unsupervised Multimodal Learning With Noisy Pseudo LabelsIEEE Transactions on Image Processing10.1109/TIP.2024.342648233(5086-5097)Online publication date: 27-Aug-2024
    • (2024)Cross-Modal Retrieval: A Systematic Review of Methods and Future DirectionsProceedings of the IEEE10.1109/JPROC.2024.3525147112:11(1716-1754)Online publication date: Nov-2024
    • (2024)Supervised Contrastive Discrete Hashing for cross-modal retrievalKnowledge-Based Systems10.1016/j.knosys.2024.111837295:COnline publication date: 18-Jul-2024
    • (2024)Semantic-aware matrix factorization hashing with intra- and inter-modality fusion for image-text retrievalApplied Intelligence10.1007/s10489-024-06060-255:1Online publication date: 19-Nov-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media