research-article

PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System

Authors:

Guangjing Wang,

Tianxing LiAuthors Info & Claims

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, Volume 7, Issue 3

Article No.: 130, Pages 1 - 24

https://doi.org/10.1145/3610885

Published: 27 September 2023 Publication History

Abstract

Recent advancements in deep learning have shown that multimodal inference can be particularly useful in tasks like autonomous driving, human health, and production line monitoring. However, deploying state-of-the-art multimodal models in distributed IoT systems poses unique challenges since the sensor data from low-cost edge devices can get corrupted, lost, or delayed before reaching the cloud. These problems are magnified in the presence of asymmetric data generation rates from different sensor modalities, wireless network dynamics, or unpredictable sensor behavior, leading to either increased latency or degradation in inference accuracy, which could affect the normal operation of the system with severe consequences like human injury or car accident. In this paper, we propose PATCH, a framework of speculative inference to adapt to these complex scenarios. PATCH serves as a plug-in module in the existing multimodal models, and it enables speculative inference of these off-the-shelf deep learning models. PATCH consists of 1) a Masked-AutoEncoder-based cross-modality imputation module to impute missing data using partially-available sensor data, 2) a lightweight feature pair ranking module that effectively limits the searching space for the optimal imputation configuration with low computation overhead, and 3) a data alignment module that aligns multimodal heterogeneous data streams without using accurate timestamp or external synchronization mechanisms. We implement PATCH in nine popular multimodal models using five public datasets and one self-collected dataset. The experimental results show that PATCH achieves up to 13% mean accuracy improvement over the state-of-art method while only using 10% of training data and reducing the training overhead by 73% compared to the original cost of retraining the model.

References

[1]

John Aach and George M. Church. 2001. Aligning gene expression time series with time warping algorithms. Bioinformatics 17, 6 (06 2001), 495--508.

[2]

Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra Perez, and Jorge Luis Reyes Ortiz. 2013. A public domain dataset for human activity recognition using smartphones. In Proceedings of the 21th international European symposium on artificial neural networks, computational intelligence and machine learning. 437--442.

[3]

Ho Bae, Jaehee Jang, Dahuin Jung, Hyemi Jang, Heonseok Ha, Hyungyu Lee, and Sungroh Yoon. 2018. Security and privacy issues in deep learning. arXiv preprint arXiv:1807.11655 (2018).

[4]

Pierre Baldi. 2012. Autoencoders, unsupervised learning, and deep architectures. In Proceedings of ICML workshop on unsupervised and transfer learning. JMLR Workshop and Conference Proceedings, 37--49.

[5]

Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, and Kai-Wei Chang. 2023. CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning. arXiv preprint arXiv:2303.03323 (2023).

[6]

Avi Ben-Cohen, Eyal Klang, Stephen P Raskin, Shelly Soffer, Simona Ben-Haim, Eli Konen, Michal Marianne Amitai, and Hayit Greenspan. 2019. Cross-modality synthesis from CT to PET using FCN and GAN networks for improved automated lesion detection. Engineering Applications of Artificial Intelligence 78 (2019), 186--194.

[7]

Anthony Berthelier, Thierry Chateau, Stefan Duffner, Christophe Garcia, and Christophe Blanc. 2021. Deep model compression and architecture optimization for embedded systems: A survey. Journal of Signal Processing Systems 93, 8 (2021), 863--878.

[8]

Giampaolo Bovenzi, Alessio Foggia, Salvatore Santella, Alessandro Testa, Valerio Persico, and Antonio Pescapé. 2022. Data poisoning attacks against autoencoder-based anomaly detection models: A robustness analysis. In ICC 2022-IEEE International Conference on Communications. IEEE, 5427--5432.

[9]

Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David G Andersen, Michael Kaminsky, and Subramanya R Dulloor. 2019. Scaling video analytics on constrained edge nodes. arXiv preprint arXiv:1905.13536 (2019).

[10]

Ricardo Chavarriaga, Hesam Sagha, Alberto Calatroni, Sundara Tejaswi Digumarti, Gerhard Tröster, José del R Millán, and Daniel Roggen. 2013. The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition. Pattern Recognition Letters 34, 15 (2013), 2033--2042.

Digital Library

[11]

Djabir Abdeldjalil Chekired, Mohammed Amine Togou, Lyes Khoukhi, and Adlen Ksentini. 2019. 5G-slicing-enabled scalable SDN core network: Toward an ultra-low latency of autonomous driving service. IEEE Journal on Selected Areas in Communications 37, 8 (2019), 1769--1782.

Digital Library

[12]

Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava. 2018. Detecting backdoor attacks on deep neural networks by activation clustering. arXiv preprint arXiv:1811.03728 (2018).

[13]

Hao Chen, Youfu Li, and Dan Su. 2019. Multi-modal fusion network with multi-scale multi-path and cross-modal interactions for RGB-D salient object detection. Pattern Recognition 86 (2019), 376--385.

[14]

Kashyap Chitta, Aditya Prakash, Bernhard Jaeger, Zehao Yu, Katrin Renz, and Andreas Geiger. 2022. TransFuser: Imitation with Transformer-Based Sensor Fusion for Autonomous Driving. arXiv preprint arXiv:2205.15997 (2022).

[15]

Max Chu, Annette Patton, Josh Roering, Cora Siebert, John Selker, Cara Walter, and Chet Udell. 2021. SitkaNet: A low-cost, distributed sensor network for landslide monitoring and study. HardwareX 9 (2021), e00191.

[16]

Jeremy Cohen, Elan Rosenfeld, and Zico Kolter. 2019. Certified adversarial robustness via randomized smoothing. In international conference on machine learning. PMLR, 1310--1320.

[17]

Pedro Costa, Adrian Galdran, Maria Ines Meyer, Meindert Niemeijer, Michael Abràmoff, Ana Maria Mendonça, and Aurélio Campilho. 2017. End-to-end adversarial retinal image synthesis. IEEE transactions on medical imaging 37, 3 (2017), 781--791.

[18]

Karren Dai Yang, Anastasiya Belyaeva, Saradha Venkatachalapathy, Karthik Damodaran, Abigail Katcoff, Adityanarayanan Radhakrishnan, GV Shivashankar, and Caroline Uhler. 2021. Multi-domain translation between single-cell imaging and sequencing data using autoencoders. Nature Communications 12, 1 (2021), 1--10.

[19]

Antônio Dâmaso, Nelson Rosa, and Paulo Maciel. 2014. Reliability of wireless sensor networks. Sensors 14, 9 (2014), 15760--15785.

[20]

Jose Dolz, Karthik Gopinath, Jing Yuan, Herve Lombaert, Christian Desrosiers, and Ismail Ben Ayed. 2018. HyperDense-Net: a hyper-densely connected CNN for multi-modal image segmentation. IEEE transactions on medical imaging 38, 5 (2018), 1116--1126.

[21]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ICLR (2021).

[22]

Alexey Dosovitskiy, German Ros, Felipe Codevilla, Antonio Lopez, and Vladlen Koltun. 2017. CARLA: An open urban driving simulator. In Conference on robot learning. PMLR, 1--16.

[23]

Konrad Gadzicki, Razieh Khamsehashari, and Christoph Zetzsche. 2020. Early vs late fusion in multimodal convolutional neural networks. In 2020 IEEE 23rd international conference on information fusion (FUSION). IEEE, 1--6.

[24]

Wenbin Gao, Lei Zhang, Qi Teng, Jun He, and Hao Wu. 2021. DanHAR: Dual attention network for multimodal human activity recognition using wearable sensors. Applied Soft Computing 111 (2021), 107728.

Digital Library

[25]

Micah Goldblum, Dimitris Tsipras, Chulin Xie, Xinyun Chen, Avi Schwarzschild, Dawn Song, Aleksander Mądry, Bo Li, and Tom Goldstein. 2022. Dataset security for machine learning: Data poisoning, backdoor attacks, and defenses. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 2 (2022), 1563--1580.

[26]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep learning. MIT press.

Digital Library

[27]

Feng Han, Lan Zhang, Xuanke You, Guangjing Wang, and Xiang-Yang Li. 2019. SHAD: Privacy-Friendly Shared Activity Detection and Data Sharing. In 2019 IEEE 16th International Conference on Mobile Ad Hoc and Sensor Systems (MASS). IEEE, 109--117.

[28]

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. 2016. Mcdnn: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services. 123--136.

Digital Library

[29]

Mark Hardiman, Ying Ou, Ryan Frazier, Zeyi Lee, and Longxiang Cui. 2015. Project NoScope. Master's thesis. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-51.html

[30]

M. A. Hasan. 2009. On multi-set canonical correlation analysis. In 2009 International Joint Conference on Neural Networks. 1128--1133.

[31]

Kaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, and Ross Girshick. 2022. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 16000--16009.

[32]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.

[33]

Nathan Henderson, Andrew Emerson, Jonathan Rowe, and James Lester. 2019. Improving sensor-based affect detection with multimodal data imputation. In 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 669--675.

[34]

Yuta Hiasa, Yoshito Otake, Masaki Takao, Takumi Matsuoka, Kazuma Takashima, Aaron Carass, Jerry L Prince, Nobuhiko Sugano, and Yoshinobu Sato. 2018. Cross-modality image synthesis from unpaired data using CycleGAN. In International workshop on simulation and synthesis in medical imaging. Springer, 31--41.

Digital Library

[35]

He Huang, Philip S Yu, and Changhu Wang. 2018. An introduction to image synthesis with generative adversarial nets. arXiv preprint arXiv:1803.04469 (2018).

[36]

Junxian Huang, Feng Qian, Yihua Guo, Yuanyuan Zhou, Qiang Xu, Z Morley Mao, Subhabrata Sen, and Oliver Spatscheck. 2013. An in-depth study of LTE: Effect of network protocol and application behavior on performance. ACM SIGCOMM Computer Communication Review 43, 4 (2013), 363--374.

Digital Library

[37]

W Ronny Huang, Jonas Geiping, Liam Fowl, Gavin Taylor, and Tom Goldstein. 2020. Metapoison: Practical general-purpose clean-label data poisoning. Advances in Neural Information Processing Systems 33 (2020), 12080--12091.

[38]

Maqbool Hussain, Taqdir Ali, Wajahat Ali Khan, Muhammad Afzal, Sungyoung Lee, and Khalid Latif. 2015. Recommendations service for chronic disease patient in multimodel sensors home environment. Telemedicine and e-Health 21, 3 (2015), 185--199.

[39]

Phillip Isola, Jun-Yan Zhu, Tinghui Zhou, and Alexei A Efros. 2017. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1125--1134.

[40]

Natasha Jaques, Sara Taylor, Akane Sano, and Rosalind Picard. 2017. Multimodal autoencoder: A deep learning approach to filling in missing sensor data and enabling better mood prediction. In 2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 202--208.

[41]

Junchen Jiang, Ganesh Ananthanarayanan, Peter Bodik, Siddhartha Sen, and Ion Stoica. 2018. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication. 253--266.

Digital Library

[42]

B-H Juang. 1984. On the hidden Markov model and dynamic time warping for speech recognition--A unified view. AT&T Bell Laboratories Technical Journal 63, 7 (1984), 1213--1243.

[43]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale Video Classification with Convolutional Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[44]

Taeksoo Kim, Moonsu Cha, Hyunsoo Kim, Jung Kwon Lee, and Jiwon Kim. 2017. Learning to discover cross-domain relations with generative adversarial networks. In International conference on machine learning. PMLR, 1857--1865.

[45]

Diederik P Kingma and Max Welling. 2013. Auto-Encoding Variational Bayes. https://doi.org/10.48550/ARXIV.1312.6114

[46]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012), 1097--1105.

[47]

Dana Lahat, Tülay Adali, and Christian Jutten. 2015. Multimodal Data Fusion: An Overview of Methods, Challenges, and Prospects. Proc. IEEE 103, 9 (2015), 1449--1477. https://doi.org/10.1109/JPROC.2015.2460697

[48]

Baiying Lei, Zaimin Xia, Feng Jiang, Xudong Jiang, Zongyuan Ge, Yanwu Xu, Jing Qin, Siping Chen, Tianfu Wang, and Shuqiang Wang. 2020. Skin lesion segmentation via generative adversarial networks with dual discriminators. Medical Image Analysis 64 (2020), 101716.

[49]

Chunyuan Li, Hao Liu, Changyou Chen, Yuchen Pu, Liqun Chen, Ricardo Henao, and Lawrence Carin. 2017. Alice: Towards understanding adversarial learning for joint distribution matching. Advances in neural information processing systems 30 (2017).

[50]

He Li, Kaoru Ota, and Mianxiong Dong. 2018. Learning IoT in edge: Deep learning for the Internet of Things with edge computing. IEEE network 32, 1 (2018), 96--101.

[51]

Tianxing Li, Jin Huang, Erik Risinger, and Deepak Ganesan. 2021. Low-latency speculative inference on distributed multi-modal data streams. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. 67--80.

Digital Library

[52]

Liangkai Liu, Sidi Lu, Ren Zhong, Baofu Wu, Yongtao Yao, Qingyang Zhang, and Weisong Shi. 2020. Computing systems for autonomous driving: State of the art and challenges. IEEE Internet of Things Journal 8, 8 (2020), 6469--6486.

[53]

Jonathan Long, Evan Shelhamer, and Trevor Darrell. 2015. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 3431--3440.

[54]

Peizhuo Lv, Chang Yue, Ruigang Liang, Yunfei Yang, Shengzhi Zhang, Hualong Ma, and Kai Chen. 2023. A Data-free Backdoor Injection Approach in Neural Networks. (2023).

[55]

Sathiya Kumaran Mani, Ramakrishnan Durairajan, Paul Barford, and Joel Sommers. 2018. A system for clock synchronization in an internet of things. arXiv preprint arXiv:1806.02474 (2018).

[56]

Héctor P Martínez and Georgios N Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th International conference on multimodal interaction. 34--41.

Digital Library

[57]

Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784 (2014).

[58]

Francisco Javier Ordóñez and Daniel Roggen. 2016. Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16, 1 (2016), 115.

[59]

Stavros Petridis, Themos Stafylakis, Pingehuan Ma, Feipeng Cai, Georgios Tzimiropoulos, and Maja Pantic. 2018. End-to-end audiovisual speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 6548--6552.

Digital Library

[60]

Aditya Prakash, Kashyap Chitta, and Andreas Geiger. 2021. Multi-modal fusion transformer for end-to-end autonomous driving. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 7077--7087.

[61]

Hangwei Qian, Sinno Jialin Pan, Bingshui Da, and Chunyan Miao. 2019. A Novel Distribution-Embedded Neural Network for Sensor-Based Activity Recognition. In IJCAI, Vol. 2019. 5614--5620.

[62]

Habib F Rashvand and Jose M Alcaraz Calero. 2012. Distributed sensor systems: practice and applications. John Wiley & Sons.

[63]

S. Shariat and V. Pavlovic. 2011. Isotonic CCA for sequence alignment and activity recognition. In 2011 International Conference on Computer Vision. 2572--2578.

[64]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).

[65]

Yapeng Tian, Dingzeyu Li, and Chenliang Xu. 2020. Unified Multisensory Perception: Weakly-Supervised Audio-Visual Video Parsing. In ECCV.

[66]

Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, and Chenliang Xu. 2018. Audio-visual event localization in unconstrained videos. In Proceedings of the European Conference on Computer Vision (ECCV). 247--263.

Digital Library

[67]

Yonatan Vaizman, Nadir Weibel, and Gert Lanckriet. 2018. Context recognition in-the-wild: Unified model for multi-modal sensors and multi-label classification. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 4 (2018), 1--22.

Digital Library

[68]

Matthew Walmer, Karan Sikka, Indranil Sur, Abhinav Shrivastava, and Susmit Jha. 2022. Dual-key multimodal backdoors for visual question answering. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. 15375--15385.

[69]

Guangjing Wang, Hanqing Guo, Anran Li, Xiaorui Liu, and Qiben Yan. 2023. Federated IoT Interaction Vulnerability Analysis. In 2023 IEEE 39th International Conference on Data Engineering (ICDE). IEEE.

[70]

Guangjing Wang, Nikolay Ivanov, Bocheng Chen, Qi Wang, ThanhVu Nguyen, and Qiben Yan. 2023. Graph Learning for Interactive Threat Detection in Heterogeneous Smart Home Rule Data. Proceedings of the ACM on Management of Data 1, 1 (2023), 1--27.

Digital Library

[71]

Guangjing Wang, Lan Zhang, Zhi Yang, and Xiang-Yang Li. 2018. Socialite: Social activity mining and friend auto-labeling. In 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC). IEEE, 1--8.

[72]

Wei Wang, Dan Wang, and Yu Jiang. 2017. Energy efficient distributed compressed data gathering for sensor networks. Ad Hoc Networks 58 (2017), 112--117.

Digital Library

[73]

Yuanda Wang, Hanqing Guo, Guangjing Wang, Bocheng Chen, and Qiben Yan. 2023. VSMask: Defending Against Voice Synthesis Attack via Real-Time Predictive Perturbation. arXiv preprint arXiv:2305.05736 (2023).

[74]

Mike Wu and Noah Goodman. 2018. Multimodal generative models for scalable weakly-supervised learning. Advances in Neural Information Processing Systems 31 (2018).

[75]

Yu Wu and Yi Yang. 2021. Exploring Heterogeneous Clues for Weakly-Supervised Audio-Visual Video Parsing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[76]

Haoming Xu, Runhao Zeng, Qingyao Wu, Mingkui Tan, and Chuang Gan. 2020. Cross-Modal Relation-Aware Networks for Audio-Visual Event Localization. In ACM International Conference on Multimedia.

[77]

Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, and Yang Zhang. 2022. Data Poisoning Attacks Against Multimodal Encoders. arXiv preprint arXiv:2209.15266 (2022).

[78]

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. Deepsense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th international conference on world wide web. 351--360.

Digital Library

[79]

Shuochao Yao, Yiran Zhao, Shaohan Hu, and Tarek Abdelzaher. 2018. Qualitydeepsense: Quality-aware deep learning framework for internet of things applications with sensor-temporal attention. In Proceedings of the 2nd International Workshop on Embedded and Mobile Deep Learning. 42--47.

Digital Library

[80]

Jongwon Yoon, Sayandeep Sen, and Joshua Hare. 2012. CRAWDAD dataset wisc/wiscape (v. 2012-08-03). Downloaded from https://crawdad.org/wisc/wiscape/20120803. https://doi.org/10.15783/C71C7D

[81]

Biting Yu, Luping Zhou, Lei Wang, Yinghuan Shi, Jurgen Fripp, and Pierrick Bourgeat. 2019. Ea-GANs: edge-aware generative adversarial networks for cross-modality MR image synthesis. IEEE transactions on medical imaging 38, 7 (2019), 1750--1762.

[82]

Guan Yu, Quefeng Li, Dinggang Shen, and Yufeng Liu. 2020. Optimal sparse linear prediction for block-missing multi-modality data without imputation. J. Amer. Statist. Assoc. 115, 531 (2020), 1406--1419.

[83]

Wenpeng Yu, Wenxuan Yao, Xianda Deng, Yinfeng Zhao, and Yilu Liu. 2019. Timestamp Shift Detection for Synchrophasor Data Based on Similarity Analysis between Relative Phase Angle and Frequency. IEEE Transactions on Power Delivery (2019).

[84]

Lei Yuan, Yalin Wang, Paul M Thompson, Vaibhav A Narayan, Jieping Ye, Alzheimer's Disease Neuroimaging Initiative, et al. 2012. Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data. NeuroImage 61, 3 (2012), 622--632.

[85]

Martina Zambelli, Antoine Cully, and Yiannis Demiris. 2020. Multimodal representation models for prediction and control from partial information. Robotics and Autonomous Systems 123 (2020), 103312.

Digital Library

[86]

Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J Freedman. 2017. Live video analytics at scale with approximation and delay-tolerance. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17). 377--392.

[87]

Xuezhou Zhang, Xiaojin Zhu, and Laurent Lessard. 2020. Online data poisoning attacks. In Learning for Dynamics and Control. PMLR, 201--210.

[88]

Yifei Zhang, Désiré Sidibé, Olivier Morel, and Fabrice Mériaudeau. 2021. Deep multimodal fusion for semantic image segmentation: A survey. Image and Vision Computing 105 (2021), 104042.

[89]

Ce Zhou, Qian Li, Chen Li, Jun Yu, Yixin Liu, Guangjing Wang, Kai Zhang, Cheng Ji, Qiben Yan, Lifang He, et al. 2023. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419 (2023).

[90]

F. Zhou and F. De la Torre. 2012. Generalized time warping for multi-modal alignment of human motion. In 2012 IEEE Conference on Computer Vision and Pattern Recognition. 1282--1289.

[91]

F. Zhou and F. De la Torre. 2016. Generalized Canonical Time Warping. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 2 (2016), 279--294.

Digital Library

[92]

Feng Zhou and Fernando Torre. 2009. Canonical Time Warping for Alignment of Human Behavior. In Advances in Neural Information Processing Systems 22, Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta (Eds.). Curran Associates, Inc., 2286--2294. http://papers.nips.cc/paper/3728-canonical-time-warping-for-alignment-of-human-behavior.pdf

[93]

Jinxing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, and Meng Wang. 2021. Positive Sample Propagation along the Audio-Visual Event Line. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[94]

Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision. 2223--2232.

[95]

Xiao Zhu, Yihua Ethan Guo, Ashkan Nikravesh, Feng Qian, and Z Morley Mao. 2019. Understanding the networking performance of wear OS. Proceedings of the ACM on Measurement and Analysis of Computing Systems 3, 1 (2019), 1--25.

Digital Library

Cited By

Hu YZhang SDang TJia HSalim FHu WQuigley AKostakos VKay JHoang T(2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678494
Liu XLiu HLi JYang ZHuang YZhang JKostakos VKay JHoang T(2024)AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile PhonesCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678488(377-383)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678488
Sahu NGupta SLone H(2024)Wearable Technology Insights: Unveiling Physiological Responses During Three Different Socially Anxious ActivitiesACM Journal on Computing and Sustainable Societies10.1145/36636712:2(1-23)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3663671
Show More Cited By

Index Terms

PATCH: A Plug-in Framework of Non-blocking Inference for Distributed Multimodal System

Recommendations

Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning
MMAsia '24: Proceedings of the 6th ACM International Conference on Multimedia in Asia
Training multimodal models requires a large amount of labeled data. Active learning (AL) aim to reduce labeling costs. Most AL methods employ warm-start approaches, which rely on sufficient labeled data to train a well-calibrated model that can assess the ...
Multimodal federated learning: Concept, methods, applications and future directions
Abstract
Multimodal learning mines and analyzes multimodal data in reality to better understand and appreciate the world around people. However, how to exploit this rich multimodal data without violating user privacy is a key issue. Federated learning is ...
Highlights
- The three different modes in the multimodal federated learning model are summarized.
- Multimodal fusion based on the federated learning framework is also specified.
- The difficulties and some ideas of multimodal federated learning ...
Multimodal learning with deep Boltzmann machines

Data often consists of multiple diverse modalities. For example, images are tagged with textual information and videos are accompanied by audio. Each modality is characterized by having distinct statistical properties. We propose a Deep Boltzmann Machine ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies Volume 7, Issue 3

September 2023

1734 pages

EISSN:2474-9567

DOI:10.1145/3626192

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 September 2023

Published in IMWUT Volume 7, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
321
Total Downloads

Downloads (Last 12 months)183
Downloads (Last 6 weeks)15

Reflects downloads up to 18 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Hu YZhang SDang TJia HSalim FHu WQuigley AKostakos VKay JHoang T(2024)Exploring Large-Scale Language Models to Evaluate EEG-Based Multimodal Data for Mental HealthCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678494(412-417)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678494
Liu XLiu HLi JYang ZHuang YZhang JKostakos VKay JHoang T(2024)AcousAF: Acoustic Sensing-Based Atrial Fibrillation Detection System for Mobile PhonesCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678488(377-383)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678488
Sahu NGupta SLone H(2024)Wearable Technology Insights: Unveiling Physiological Responses During Three Different Socially Anxious ActivitiesACM Journal on Computing and Sustainable Societies10.1145/36636712:2(1-23)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3663671
Wang JFeng YKumbhar GWang GYan QJin QFerrier RXiong JLi TOkoshi TKo JLiKamWa R(2024)SoilCares: Towards Low-cost Soil Macronutrients and Moisture Monitoring Using RF-VNIR SensingProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661868(196-209)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661868
Liu CDong ZHuang LYan WWang XFang DChen X(2024)TagSleep3DProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435128:1(1-28)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643512
Chang ZZhang FXiong JChen WZhang DGanesan DLane NShi W(2024)MSense: Boosting Wireless Sensing Capability Under Motion InterferenceProceedings of the 30th Annual International Conference on Mobile Computing and Networking10.1145/3636534.3649350(108-123)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3636534.3649350
Zhang XZhang DXie YWu DLi YZhang D(2024)WaffleProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314587:4(1-29)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631458
Zhang DZhang XXie YZhang FWang XLi YZhang D(2024)LoCalProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36314367:4(1-27)Online publication date: 12-Jan-2024
https://dl.acm.org/doi/10.1145/3631436
Yang YRen LChen CHu BZhang ZLi XShen YZhu KJi JZhang YNi YWu JWang QWu JSun LTao YWang G(2024)SnapInflatables: Designing Inflatables with Snap-through Instability for Responsive InteractionProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642933(1-15)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642933
Mäder AMeegahapola LGatica-Perez D(2024)Learning About Social Context From Smartphone Data: Generalization Across Countries and Daily Life MomentsProceedings of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613904.3642444(1-18)Online publication date: 11-May-2024
https://dl.acm.org/doi/10.1145/3613904.3642444
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents