skip to main content
10.1145/3581791.3596844acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Harmony: Heterogeneous Multi-Modal Federated Learning through Disentangled Model Training

Published: 18 June 2023 Publication History

Abstract

Multi-modal sensing systems are increasingly prevalent in real-world applications such as health monitoring and autonomous driving. Most multi-modal learning approaches need to access users' raw data, which poses significant concerns to users' privacy. Federated learning (FL) provides a privacy-aware distributed learning framework. However, current FL approaches have not addressed the unique challenges of heterogeneous multi-modal FL systems, such as modality heterogeneity and significantly longer training delay. In this paper, we propose Harmony, a new system for heterogeneous multi-modal federated learning. Harmony disentangles the multi-modal network training in a novel two-stage framework, namely modality-wise federated learning and federated fusion learning. By integrating a novel balance-aware resource allocation mechanism in modality-wise FL and exploiting modality biases in federated fusion learning, Harmony improves the model accuracy under non-i.i.d. data distributions and speeds up system convergence. We implemented Harmony on a real-world multi-modal sensor testbed deployed in the homes of 16 elderly subjects for Alzheimer's Disease monitoring. Our evaluation on the testbed and three large-scale public datasets of different applications show that, Harmony outperforms by up to 46.35% accuracy over state-of-the-art baselines and saves up to 30% training delay.

References

[1]
2021. Hardware Difference of Tesla Autopilot AP1 vs AP2 vs AP3. https://www.autopilotreview.com/tesla-autopilot-v1-v2-v3-and-beyond-differences/.
[2]
2022. ALZHEIMER'S DIGITAL BIOMARKERS. https://www.alzdiscovery.org/research-and-grants/funding-opportunities/diagnostics-accelerator-digital-biomarkers-program.
[3]
2022. Baidu Apollo. https://www.apollo.auto/.
[4]
2022. NVDIA Xavier NX. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-xavier-nx/.
[5]
2022. Systemd. https://en.wikipedia.org/wiki/Systemd.
[6]
2022. Waymo Driver. https://waymo.com/waymo-driver/?ncr.
[7]
2023. The CART Home: Collaborative Aging Research using Technology. https://www.ohsu.edu/collaborative-aging-research-using-technology/cart-home.
[8]
Ane Alberdi, Alyssa Weakley, Maureen Schmitter-Edgecombe, Diane J Cook, Asier Aztiria, Adrian Basarab, and Maitane Barrenechea. 2018. Smart home-based prediction of multidomain symptoms related to Alzheimer's disease. IEEE journal of biomedical and health informatics 22, 6 (2018), 1720--1731.
[9]
Manuela Altieri, Federica Garramone, and Gabriella Santangelo. 2021. Functional autonomy in dementia of the Alzheimer's type, mild cognitive impairment, and healthy aging: a meta-analysis. Neurological Sciences 42 (2021), 1773--1783.
[10]
Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav Choudhary. 2019. Federated learning with personalization layers. arXiv preprint arXiv:1912.00818 (2019).
[11]
Tadas Baltrušaitis, Chaitanya Ahuja, and Louis-Philippe Morency. 2018. Multi-modal machine learning: A survey and taxonomy. IEEE transactions on pattern analysis and machine intelligence 41, 2 (2018), 423--443.
[12]
Abdelkareem Bedri, Diana Li, Rushil Khurana, Kunal Bhuwalka, and Mayank Goel. 2020. Fitbyte: Automatic diet monitoring in unconstrained situations using multimodal sensing on eyeglasses. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1--12.
[13]
Chongguang Bi, Guoliang Xing, Tian Hao, Jina Huh, Wei Peng, and Mengyan Ma. 2017. Familylog: A mobile system for monitoring family mealtime activities. In 2017 ieee international conference on pervasive computing and communications (percom). IEEE, 21--30.
[14]
Joaquim Cerejeira, Luísa Lagarto, and Elizabeta Blagoja Mukaetova-Ladinska. 2012. Behavioral and psychological symptoms of dementia. Frontiers in neurology 3 (2012), 73.
[15]
Jiayi Chen and Aidong Zhang. 2022. FedMSplit: Correlation-Adaptive Federated Multi-Task Learning across Multimodal Split Networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 87--96.
[16]
Richard Chen, Filip Jankovic, Nikki Marinsek, Luca Foschini, Lampros Kourtis, Alessio Signorini, Melissa Pugh, Jie Shen, Roy Yaari, Vera Maljkovic, et al. 2019. Developing measures of cognitive impairment in the real world from consumer-grade multimodal sensor streams. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2145--2155.
[17]
Sijia Chen and Baochun Li. 2022. Towards Optimal Multi-Modal Federated Learning on Non-IID Data with Hierarchical Gradient Blending. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 1469--1478.
[18]
Chris Ding and Xiaofeng He. 2004. K-means clustering via principal component analysis. In Proceedings of the twenty-first international conference on Machine learning. 29.
[19]
Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, and Vishwanathan Vinay. 2004. Clustering large graphs via the singular value decomposition. Machine learning 56 (2004), 9--33.
[20]
Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. 2020. Personalized federated learning: A meta-learning approach. arXiv preprint arXiv:2002.07948 (2020).
[21]
Itai Gat, Idan Schwartz, Alexander Schwing, and Tamir Hazan. 2020. Removing bias in multi-modal classifiers: Regularization by maximizing functional entropies. Advances in Neural Information Processing Systems 33 (2020), 3197--3208.
[22]
Guoliang Xing. 2022. Machine Learning Technologies for Advancing Digital Biomarkers for Alzheimer's Disease, Alzheimer's Drug Discovery Foundation. https://www.alzdiscovery.org/research-and-grants/portfolio-details/21130887.
[23]
Harish Haresamudram, Irfan Essa, and Thomas Plötz. 2021. Contrastive predictive coding for human activity recognition. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 2 (2021), 1--26.
[24]
Yuze He, Li Ma, Zhehao Jiang, Yi Tang, and Guoliang Xing. 2021. VI-eye: semantic-based 3D point cloud registration for infrastructure-assisted autonomous driving. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 573--586.
[25]
Jakub Konečnỳ, H Brendan McMahan, Daniel Ramage, and Peter Richtárik. 2016. Federated optimization: Distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527 (2016).
[26]
Lampros C Kourtis, Oliver B Regele, Justin M Wright, and Graham B Jones. 2019. Digital biomarkers for Alzheimer's disease: the mobile/wearable devices opportunity. NPJ digital medicine 2, 1 (2019), 1--9.
[27]
David Leroy, Alice Coucke, Thibaut Lavril, Thibault Gisselbrecht, and Joseph Dureau. 2019. Federated learning for keyword spotting. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6341--6345.
[28]
Ang Li, Jingwei Sun, Pengcheng Li, Yu Pu, Hai Li, and Yiran Chen. 2021. Hermes: an efficient federated learning framework for heterogeneous mobile clients. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 420--437.
[29]
Chenning Li, Xiao Zeng, Mi Zhang, and Zhichao Cao. 2022. PyramidFL: A fine-grained client selection framework for efficient federated learning. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 158--171.
[30]
Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar, and Virginia Smith. 2020. Federated optimization in heterogeneous networks. Proceedings of Machine Learning and Systems 2 (2020), 429--450.
[31]
Tiantian Liu, Ming Gao, Feng Lin, Chao Wang, Zhongjie Ba, Jinsong Han, Wenyao Xu, and Kui Ren. 2021. Wavoice: A noise-resistant multi-modal speech recognition system fusing mmwave and audio signals. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 97--110.
[32]
Xiulong Liu, Dongdong Liu, Jiuwu Zhang, Tao Gu, and Keqiu Li. 2021. RFID and camera fusion for recognition of human-object interactions. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 296--308.
[33]
Gill Livingston, Andrew Sommerlad, Vasiliki Orgeta, Sergi G Costafreda, Jonathan Huntley, David Ames, Clive Ballard, Sube Banerjee, Alistair Burns, Jiska Cohen-Mansfield, et al. 2017. Dementia prevention, intervention, and care. The lancet 390, 10113 (2017), 2673--2734.
[34]
Chris Xiaoxuan Lu, Muhamad Risqi U Saputra, Peijun Zhao, Yasin Almalioglu, Pedro PB de Gusmao, Changhao Chen, Ke Sun, Niki Trigoni, and Andrew Markham. 2020. milliEgo: single-chip mmWave radar aided egomotion estimation via deep sensor fusion. In Proceedings of the 18th Conference on Embedded Networked Sensor Systems. 109--122.
[35]
Haojie Ma, Wenzhong Li, Xiao Zhang, Songcheng Gao, and Sanglu Lu. 2019. AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition. In IJCAI. 3109--3115.
[36]
Gad A Marshall, Lynn A Fairbanks, Sibel Tekin, Harry V Vinters, and Jeffrey L Cummings. 2006. Neuropathologic correlates of activities of daily living in Alzheimer disease. Alzheimer Disease & Associated Disorders 20, 1 (2006), 56--59.
[37]
Marie Mc Carthy and P Schueler. 2019. can digital technology advance the development of treatments for Alzheimer's disease? The Journal of Prevention of Alzheimer's Disease 6, 4 (2019), 217--220.
[38]
H Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, et al. 2016. Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629 (2016).
[39]
Hrushikesh Mhaskar, Qianli Liao, and Tomaso Poggio. 2017. When and why are deep networks better than shallow ones?. In Proceedings of the AAAI conference on artificial intelligence, Vol. 31.
[40]
Sebastian Münzner, Philip Schmidt, Attila Reiss, Michael Hanselmann, Rainer Stiefelhagen, and Robert Dürichen. 2017. CNN-based sensor fusion techniques for multimodal human activity recognition. In Proceedings of the 2017 ACM International Symposium on Wearable Computers. 158--165.
[41]
Ferda Ofli, Rizwan Chaudhry, Gregorij Kurillo, René Vidal, and Ruzena Bajcsy. 2013. Berkeley mhad: A comprehensive multimodal human action database. In 2013 IEEE workshop on applications of computer vision (WACV). IEEE, 53--60.
[42]
Xiaomin Ouyang, Xian Shuai, Jiayu Zhou, Ivy Wang Shi, Zhiyuan Xie, Guoliang Xing, and Jianwei Huang. 2022. Cosmo: contrastive fusion learning with small data for multimodal human activity recognition. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 324--337.
[43]
Xiaomin Ouyang, Zhiyuan Xie, Jiayu Zhou, Jianwei Huang, and Guoliang Xing. 2021. Clusterfl: a similarity-aware federated learning system for human activity recognition. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 54--66.
[44]
Xiaomin Ouyang, Zhiyuan Xie, Jiayu Zhou, Guoliang Xing, and Jianwei Huang. 2022. ClusterFL: A Clustering-based Federated Learning System for Human Activity Recognition. ACM Transactions on Sensor Networks 19, 1 (2022), 1--32.
[45]
Ishwari Singh Rajput and Deepa Gupta. 2012. A priority based round robin CPU scheduling algorithm for real time systems. International Journal of Innovations in Engineering and Technology 1, 3 (2012), 1--11.
[46]
Dhanesh Ramachandram and Graham W Taylor. 2017. Deep multimodal learning: A survey on recent advances and trends. IEEE signal processing magazine 34, 6 (2017), 96--108.
[47]
Batool Salehi, Jerry Gu, Debashri Roy, and Kaushik Chowdhury. 2022. FLASH: Federated learning for automated selection of high-band mmWave sectors. In IEEE INFOCOM 2022-IEEE Conference on Computer Communications. IEEE, 1719--1728.
[48]
Eric Salmon*, Solange Lespagnard*, Patricia Marique, F Peeters, Karl Herholz, Daniela Perani, Vjera Holthoff, Elke Kalbe, D Anchisi, Stéphane Adam, et al. 2005. Cerebral metabolic correlates of four dementia scales in Alzheimer's disease. Journal of neurology 252 (2005), 283--290.
[49]
Shuyao Shi, Jiahe Cui, Zhehao Jiang, Zhenyu Yan, Guoliang Xing, Jianwei Niu, and Zhenchao Ouyang. 2022. VIPS: real-time perception fusion for infrastructure-assisted autonomous driving. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 133--146.
[50]
Jaemin Shin, Yuanchun Li, Yunxin Liu, and Sung-Ju Lee. 2022. FedBalancer: Data and Pace Control for Efficient Federated Learning on Heterogeneous Clients. In International Conference on Mobile Systems, Applications and Services (MobiSys). ACM, 436--449.
[51]
Virginia Smith, Chao-Kai Chiang, Maziar Sanjabi, and Ameet S Talwalkar. 2017. Federated multi-task learning. In Advances in Neural Information Processing Systems. 4424--4434.
[52]
Luan Tran, Xiaoming Liu, Jiayu Zhou, and Rong Jin. 2017. Missing modalities imputation via cascaded residual autoencoder. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1405--1414.
[53]
Linlin Tu, Xiaomin Ouyang, Jiayu Zhou, Yuze He, and Guoliang Xing. 2021. Feddl: Federated learning via dynamic layer sharing for human activity recognition. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 15--28.
[54]
Qi Wang, Liang Zhan, Paul Thompson, and Jiayu Zhou. 2020. Multimodal learning with incomplete modalities by knowledge distillation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1828--1838.
[55]
Weiyao Wang, Du Tran, and Matt Feiszli. 2020. What makes training multi-modal classification networks hard?. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12695--12705.
[56]
Fang-Jing Wu and Gürkan Solmaz. 2018. Crowdestimator: Approximating crowd sizes with multi-modal data for internet-of-things services. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services. 337--349.
[57]
Nan Wu, Stanislaw Jastrzebski, Kyunghyun Cho, and Krzysztof J Geras. 2022. Characterizing and overcoming the greedy nature of learning in multi-modal deep neural networks. In International Conference on Machine Learning. PMLR, 24043--24055.
[58]
Zhiyuan Xie, Xiaomin Ouyang, Xiaoming Liu, and Guoliang Xing. 2021. Ultra-Depth: Exposing high-resolution texture from depth cameras. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 302--315.
[59]
Zhiyuan Xie, Xiaomin Ouyang, Li Pan, Wenrui Lu, Xiaoming Liu, and Guoliang Xing. 2022. HiToF: a ToF camera system for capturing high-resolution textures. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking. 764--765.
[60]
Baochen Xiong, Xiaoshan Yang, Fan Qi, and Changsheng Xu. 2022. A unified framework for multi-modal federated learning. Neurocomputing 480 (2022), 110--118.
[61]
Huatao Xu, Pengfei Zhou, Rui Tan, Mo Li, and Guobin Shen. 2021. LIMU-BERT: Unleashing the Potential of Unlabeled Data for IMU Sensing Applications. In Proceedings of the 19th ACM Conference on Embedded Networked Sensor Systems. 220--233.
[62]
Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th international conference on world wide web. 351--360.
[63]
Tao Yu, Eugene Bagdasaryan, and Vitaly Shmatikov. 2020. Salvaging federated learning by local adaptation. (2020). arXiv:2002.04758
[64]
Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. 2014. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329 (2014).
[65]
Hanbin Zhang, Gabriel Guo, Chen Song, Chenhan Xu, Kevin Cheung, Jasleen Alexis, Huining Li, Dongmei Li, Kun Wang, and Wenyao Xu. 2020. PDLens: smartphone knows drug effectiveness among Parkinson's via daily-life activity fusion. In Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 1--14.
[66]
Mi Zhang and Alexander A Sawchuk. 2012. USC-HAD: a daily activity dataset for ubiquitous activity recognition using wearable sensors. In Proceedings of the 2012 ACM conference on ubiquitous computing. 1036--1043.
[67]
Yuchen Zhao, Payam Barnaghi, and Hamed Haddadi. 2022. Multimodal Federated Learning on IoT Data. In 2022 IEEE/ACM Seventh International Conference on Internet-of-Things Design and Implementation (IoTDI). IEEE, 43--54.

Cited By

View all
  • (2025)ClassTer: Mobile Shift-Robust Personalized Federated Learning via Class-Wise ClusteringIEEE Transactions on Mobile Computing10.1109/TMC.2024.348729424:3(2014-2028)Online publication date: Mar-2025
  • (2024)MultimodalHD: Federated Learning Over Heterogeneous Sensor Modalities using Hyperdimensional Computing2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546794(1-6)Online publication date: 25-Mar-2024
  • (2024)Towards Efficient Heterogeneous Multi-Modal Federated Learning with Hierarchical Knowledge DisentanglementProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699360(592-605)Online publication date: 4-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiSys '23: Proceedings of the 21st Annual International Conference on Mobile Systems, Applications and Services
June 2023
651 pages
ISBN:9798400701108
DOI:10.1145/3581791
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. multi-modal federated learning systems
  2. modality heterogeneity
  3. balance-aware resource allocation

Qualifiers

  • Research-article

Funding Sources

  • Research Grants Council (RGC) of Hong Kong
  • Alzheimer's Drug Discovery Foundation
  • National Natural Science Foundation of China
  • Shenzhen Science and Technology Program
  • Guangdong Basic and Applied Basic Research Foundation
  • Shenzhen Key Lab of Crowd Intelligence Empowered Low-Carbon Energy Network
  • Shenzhen Institute of Artificial Intelligence and Ro- botics for Society

Conference

MobiSys '23
Sponsor:

Acceptance Rates

MobiSys '23 Paper Acceptance Rate 41 of 198 submissions, 21%;
Overall Acceptance Rate 274 of 1,679 submissions, 16%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)972
  • Downloads (Last 6 weeks)54
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)ClassTer: Mobile Shift-Robust Personalized Federated Learning via Class-Wise ClusteringIEEE Transactions on Mobile Computing10.1109/TMC.2024.348729424:3(2014-2028)Online publication date: Mar-2025
  • (2024)MultimodalHD: Federated Learning Over Heterogeneous Sensor Modalities using Hyperdimensional Computing2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546794(1-6)Online publication date: 25-Mar-2024
  • (2024)Towards Efficient Heterogeneous Multi-Modal Federated Learning with Hierarchical Knowledge DisentanglementProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699360(592-605)Online publication date: 4-Nov-2024
  • (2024)Effective Heterogeneous Federated Learning via Efficient Hypernetwork-based Weight GenerationProceedings of the 22nd ACM Conference on Embedded Networked Sensor Systems10.1145/3666025.3699326(112-125)Online publication date: 4-Nov-2024
  • (2024)ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681215(4591-4600)Online publication date: 28-Oct-2024
  • (2024)EchoPFLProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435608:1(1-22)Online publication date: 6-Mar-2024
  • (2024)Age of Information Based Client Selection for Wireless Federated Learning With Diversified Learning CapabilitiesIEEE Transactions on Mobile Computing10.1109/TMC.2024.345054923:12(14934-14945)Online publication date: Dec-2024
  • (2024)LLMSense: Harnessing LLMs for High-level Reasoning Over Spatiotemporal Sensor Traces2024 IEEE 3rd Workshop on Machine Learning on Edge in Sensor Systems (SenSys-ML)10.1109/SenSys-ML62579.2024.00007(9-14)Online publication date: 13-May-2024
  • (2024)Hybrid Federated Learning for Multimodal IoT SystemsIEEE Internet of Things Journal10.1109/JIOT.2024.344326711:21(34055-34064)Online publication date: 1-Nov-2024
  • (2024)Demo Abstract: AD-CLIP: Privacy-Preserving, Low-Cost Synthetic Human Action Dataset for Alzheimer’s Patients via CLIP-based Models2024 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN)10.1109/IPSN61024.2024.00029(257-258)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media