research-article

Knowledge-driven Egocentric Multimodal Activity Recognition

Authors:

Changsheng XuAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 16, Issue 4

Article No.: 133, Pages 1 - 133

https://doi.org/10.1145/3409332

Published: 17 December 2020 Publication History

Abstract

Recognizing activities from egocentric multimodal data collected by wearable cameras and sensors, is gaining interest, as multimodal methods always benefit from the complementarity of different modalities. However, since high-dimensional videos contain rich high-level semantic information while low-dimensional sensor signals describe simple motion patterns of the wearer, the large modality gap between the videos and the sensor signals raises a challenge for fusing the raw data. Moreover, the lack of large-scale egocentric multimodal datasets due to the cost of data collection and annotation processes makes another challenge for employing complex deep learning models. To jointly deal with the above two challenges, we propose a knowledge-driven multimodal activity recognition framework that exploits external knowledge to fuse multimodal data and reduce the dependence on large-scale training samples. Specifically, we design a dual-GCLSTM (Graph Convolutional LSTM) and a multi-layer GCN (Graph Convolutional Network) to collectively model the relations among activities and intermediate objects. The dual-GCLSTM is designed to fuse temporal multimodal features with top-down relation-aware guidance. In addition, we apply a co-attention mechanism to adaptively attend to the features of different modalities at different timesteps. The multi-layer GCN aims to learn relation-aware classifiers of activity categories. Experimental results on three publicly available egocentric multimodal datasets show the effectiveness of the proposed model.

References

[1]

Fabien Baradel, Natalia Neverova, Christian Wolf, Julien Mille, and Greg Mori. 2018. Object level visual reasoning in videos. In Proceedings of the 15th European Conference on Computer Vision (ECCV’18). Springer, 105--121.

[2]

Edgar A. Bernal, Xitong Yang, Qun Li, Jayant Kumar, Sriganesh Madhvanath, Palghat Ramesh, and Raja Bala. 2017. Deep temporal multimodal fusion for medical procedure monitoring using wearable sensors. IEEE Trans. Multimedia 20, 1 (Jan. 2017), 107--118.

[3]

Alejandro Betancourt, Pietro Morerio, Carlo S. Regazzoni, and Matthias Rauterberg. 2015. The evolution of first person vision methods: A survey. IEEE Trans. Circ. Syst. Vid. Technol. 25, 5 (May 2015), 744--760.

[4]

Andreas Bulling, Ulf Blanke, and Bernt Schiele. 2014. A tutorial on human activity recognition using body-worn inertial sensors. ACM Comput. Surv. 46, 3 (Jan. 2014), 33.

Digital Library

[5]

Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanem, and Juan Carlos Niebles. 2015. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 961--970.

[6]

Minjie Cai, Kris M. Kitani, and Yoichi Sato. 2016. Understanding hand-object manipulation with grasp types and object attributes. In Proceedings of the Robotics: Science and Systems Conference, Vol. 3.

[7]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? A new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6299--6308.

[8]

Chen Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. 2017. A survey of depth and inertial sensor fusion for human action recognition. Multimedia Tools. Applic. 76, 3 (Feb. 2017), 4405--4425.

[9]

Yuting Chen, Joseph Wang, Yannan Bai, Gregory Castañón, and Venkatesh Saligrama. 2018. Probabilistic semantic retrieval for surveillance videos with activity graphs. IEEE Trans. Multimedia 21, 3 (Mar. 2018), 704--716.

[10]

Maria Cornacchia, Koray Ozcan, Yu Zheng, and Senem Velipasalar. 2017. A survey on activity detection and classification using wearable sensors. IEEE Sensors J. 17, 2 (Jan. 2017), 386--403.

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). 248--255.

[12]

Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2625--2634.

[13]

Yuan Fang, Kingsley Kuan, Jie Lin, Cheston Tan, and Vijay Chandrasekhar. 2017. Object detection meets knowledge graphs. In Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). 1661--1667.

[14]

Alireza Fathi, Yin Li, and James M. Rehg. 2012. Learning to recognize daily actions using gaze. In Proceedings of the 12th European Conference on Computer Vision (ECCV’12). Springer, 314--327.

[15]

Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2018. Watch, think and attend: End-to-end video classification via dynamic knowledge evolution modeling. In Proceedings of the 26th ACM International Conference on Multimedia (MM’18). ACM, 690--699.

Digital Library

[16]

Junyu Gao, Tianzhu Zhang, and Changsheng Xu. 2019. I know the relationships: Zero-shot action recognition via two-stream graph convolutional networks and knowledge graphs. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence (AAAI’19), Vol. 33.

[17]

Weili Guan, Xuemeng Song, Tian Gan, Junyu Lin, Xiaojun Chang, and Liqiang Nie. 2019. Cooperation learning from multiple social networks: Consistent and complementary perspectives. IEEE Trans. Cybern. (2019).

[18]

Sojeong Ha and Seungjin Choi. 2016. Convolutional neural networks for human activity recognition using multiple accelerometer and gyroscope sensors. In Proceedings of the International Joint Conference on Neural Networks (IJCNN’16). 381--388.

[19]

Sojeong Ha, Jeong-Min Yun, and Seungjin Choi. 2015. Multi-modal convolutional neural networks for activity recognition. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC’15). 3017--3022.

Digital Library

[20]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). 1026--1034.

Digital Library

[21]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Comput. 9, 8 (Nov. 1997), 1735--1780.

Digital Library

[22]

Peng-Ju Hsieh, Yen-Liang Lin, Yu-Hsiu Chen, and Winston Hsu. 2016. Egocentric activity recognition by leveraging multiple mid-level representations. In Proceedings of the IEEE International Conference on Multimedia and Expo (ICME’16). 1--6.

[23]

Fairouz Hussein and Massimo Piccardi. 2017. V-JAUNE: A framework for joint action recognition and video summarization. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (May 2017), 20.

Digital Library

[24]

Ahmad Babaeian Jelodar, David Paulius, and Yu Sun. 2019. Long activity video understanding using functional object-oriented network. IEEE Trans. Multimedia 21, 7 (July 2019), 1813--1824.

[25]

Weike Jin, Zhou Zhao, Yimeng Li, Jie Li, Jun Xiao, and Yueting Zhuang. 2019. Video question answering via knowledge-based progressive spatial-temporal attention network. ACM Trans. Multimedia Comput. Commun. Appl. 15, 2s (Aug. 2019), 1--22.

Digital Library

[26]

Andrej Karpathy, George Toderici, Sanketh Shetty, Thomas Leung, Rahul Sukthankar, and Li Fei-Fei. 2014. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1725--1732.

Digital Library

[27]

Diederik P. Kingma and Jimmy Ba. 2013. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations (ICLR’15). Retrieved from http://arxiv.org/abs/1412.6980.

[28]

Thomas N. Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR’17). Retrieved from https://openreview.net/forum?id=SJU4ayYgl.

[29]

Shiro Kumano, Kazuhiro Otsuka, Ryo Ishii, and Junji Yamato. 2016. Collective first-person vision for automatic gaze analysis in multiparty conversations. IEEE Trans. Multimedia 19, 1 (Jan. 2016), 107--122.

[30]

Yaguang Li, Rose Yu, Cyrus Shahabi, and Yan Liu. 2018. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. In Proceedings of the 6th International Conference on Learning Representations (ICLR’18). Retrieved from https://openreview.net/forum?id=SJiHXGWAZ.

[31]

Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards micro-video understanding by joint sequential-sparse modeling. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 970--978.

Digital Library

[32]

Shaopeng Liu, Robert Gao, and Patty Freedson. 2012. Computational methods for estimating energy expenditure in human physical activities. Med. Sci. Sports Exer. 44, 11 (2012), 2138--46.

[33]

Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical question-image co-attention for visual question answering. In Proceedings of the 30th International Conference on Advances in Neural Information Processing Systems (NeurIPS’16). Curran Associates, Inc., 289--297. Retrieved from https://papers.nips.cc/paper/6202-hierarchical-question-image-co-attention-for-visual-question-answering.

Digital Library

[34]

Minghuang Ma, Haoqi Fan, and Kris M. Kitani. 2016. Going deeper into first-person activity recognition. In Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 1894--1903.

[35]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9 (Nov. 2008), 2579--2605. Retrieved from http://www.jmlr.org/papers/v9/vandermaaten08a.html.

[36]

Kenneth Marino, Ruslan Salakhutdinov, and Abhinav Gupta. 2017. The more you know: Using knowledge graphs for image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 20--28.

[37]

Pascal Mettes and Cees G. M. Snoek. 2017. Spatial-aware object embeddings for zero-shot localization and classification of actions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). 4443--4452.

[38]

Abduallah Mohamed, Kun Qian, Mohamed Elhoseiny, and Christian Claudel. 2020. Social-STGCNN: A social spatio-temporal graph convolutional neural network for human trajectory prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’20). 14424--14432.

[39]

Pietro Morerio, Lucio Marcenaro, and Carlo S. Regazzoni. 2013. Hand detection in first person vision. In Proceedings of the 16th International Conference on Information Fusion (FUSION’13). IEEE, 1502--1507.

[40]

Abdulmajid Murad and Jae-Young Pyun. 2017. Deep recurrent neural networks for human activity recognition. Sensors 17, 11 (Nov. 2017), 2556.

[41]

Katsuyuki Nakamura, Serena Yeung, Alexandre Alahi, and Li Fei-Fei. 2017. Jointly learning energy expenditures and activities using egocentric multimodal signals. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’17). 6817--6826.

[42]

Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. 2016. Recognition of activities of daily living with egocentric vision: A review. Sensors 16, 1 (Jan. 2016), 72.

[43]

Liqiang Nie, Xuemeng Song, and Tat-Seng Chua. 2016. Learning from multiple social networks. Synt. Lect. Inf. Conc., Retr., Serv. 8, 2 (2016), 1--118.

[44]

Liqiang Nie, Xiang Wang, Jianglong Zhang, Xiangnan He, Hanwang Zhang, Richang Hong, and Qi Tian. 2017. Enhancing micro-video understanding by harnessing external sounds. In Proceedings of the 25th ACM International Conference on Multimedia (MM’17). 1192--1200.

Digital Library

[45]

Alan V. Oppenheim. 1999. Discrete-time Signal Processing. Pearson Education India.

[46]

Hamed Pirsiavash and Deva Ramanan. 2012. Detecting activities of daily living in first-person camera views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’12). 2847--2854.

[47]

Rafael Possas, Sheila Pinto Caceres, and Fabio Ramos. 2018. Egocentric activity recognition on a budget. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 5967--5976.

[48]

Shengsheng Qian, Tianzhu Zhang, and Changsheng Xu. 2018. Online multimodal multiexpert learning for social event tracking. IEEE Trans. Multimedia 20, 10 (Oct. 2018), 2733--2748.

[49]

Fereshteh Sadeghi, Santosh K. Kumar Divvala, and Ali Farhadi. 2015. VisKE: Visual knowledge extraction and question answering by visual verification of relation phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 1456--1464.

[50]

Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. 2011. Learning to share visual appearance for multiclass object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’11). 1481--1488.

Digital Library

[51]

Youngjoo Seo, Michaël Defferrard, Pierre Vandergheynst, and Xavier Bresson. 2018. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the 25th International Conference on Neural Information Processing (ICONIP’18). Springer, 362--373.

Digital Library

[52]

Zhijuan Shen, Jun Cheng, Xiping Hu, and Qian Dong. 2019. Emotion recognition based on multi-view body gestures. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 3317--3321.

[53]

Karen Simonyan and Andrew Zisserman. 2014. Two-stream convolutional networks for action recognition in videos. In Proceedings of the 28th International Conference on Advances in Neural Information Processing Systems (NeurIPS’14). 568--576. Retrieved from https://papers.nips.cc/paper/5353-two-stream-convolutional-networks-for-action-recognition-in-videos.

[54]

Mohammad Soltanian and Shahrokh Ghaemmaghami. 2019. Hierarchical concept score postprocessing and concept-wise normalization in CNN-based video event recognition. IEEE Trans. Multimedia 21, 1 (Jan. 2019), 157--172.

Digital Library

[55]

Hao Song, Xinxiao Wu, Wennan Yu, and Yunde Jia. 2018. Extracting key segments of videos for event detection by learning from web sources. IEEE Trans. Multimedia 20, 5 (May 2018), 1088--1100.

[56]

Sibo Song, Vijay Chandrasekhar, Bappaditya Mandal, Liyuan Li, Joo-Hwee Lim, Giduthuri Sateesh Babu, Phyo Phyo San, and Ngai-Man Cheung. 2016. Multimodal multi-stream deep learning for egocentric activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW’16). 378--385.

[57]

Robert Speer and Catherine Havasi. 2013. ConceptNet 5: A large semantic network for relational knowledge. In The People’s Web Meets NLP. Springer, Berlin, 161--176.

[58]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 2818--2826.

[59]

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013).

[60]

Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2016. YFCC100M: The new data in multimedia research. Commun. ACM 59, 2 (Jan. 2016), 64--73.

Digital Library

[61]

Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. 2013. Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103, 1 (2013), 60--79.

[62]

Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. 2019. Deep learning for sensor-based activity recognition: A survey. Pattern Recog. Lett. 119 (Mar. 2019), 3--11.

[63]

Lei Wang, Xu Zhao, Yunfei Si, Liangliang Cao, and Yuncai Liu. 2017. Context-associative hierarchical memory model for human activity recognition and prediction. IEEE Trans. Multimedia 19, 3 (Mar. 2017), 646--659.

Digital Library

[64]

Xiaolong Wang, Yufei Ye, and Abhinav Gupta. 2018. Zero-shot recognition via semantic embeddings and knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 6857--6866.

[65]

Yinwei Wei, Xiang Wang, Weili Guan, Liqiang Nie, Zhouchen Lin, and Baoquan Chen. 2019. Neural multimodal cooperative learning toward micro-video understanding. IEEE Trans. Image Proc. 29 (2019), 1--14.

[66]

Shuochao Yao, Shaohan Hu, Yiran Zhao, Aston Zhang, and Tarek Abdelzaher. 2017. DeepSense: A unified deep learning framework for time-series mobile sensing data processing. In Proceedings of the 26th International Conference on World Wide Web (WWW’17). International World Wide Web Conferences Steering Committee, 351--360.

Digital Library

[67]

Jun Ye, Hao Hu, Guo-Jun Qi, and Kien A. Hua. 2017. A temporal order modeling approach to human action recognition from multimodal sensor data. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (Mar. 2017), 14.

Digital Library

[68]

Xingliang Yuan, Xinyu Wang, Cong Wang, Jian Weng, and Kui Ren. 2016. Enabling secure and fast indexing for privacy-assured healthcare monitoring via compressive sensing. IEEE Trans. Multimedia 18, 10 (Oct. 2016), 2002--2014.

[69]

Tianhao Zhang, Zoe McCarthy, Owen Jow, Dennis Lee, Xi Chen, Ken Goldberg, and Pieter Abbeel. 2018. Deep imitation learning for complex manipulation tasks from virtual reality teleoperation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’18). 5628--5635.

[70]

Yi Zhu, Yang Long, Yu Guan, Shawn Newsam, and Ling Shao. 2018. Towards universal representation for unseen action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18). 9436--9445.

Cited By

Shaikh MChai DIslam SAkhtar N(2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664815
Yang XXiong BHuang YXu C(2024)Cross-Modal Federated Human Activity RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336741246:8(5345-5361)Online publication date: Aug-2024
https://doi.org/10.1109/TPAMI.2024.3367412
Pan HZhao XHe LShi YLin X(2024)A survey of multimodal federated learning: background, applications, and perspectivesMultimedia Systems10.1007/s00530-024-01422-930:4Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1007/s00530-024-01422-9
Show More Cited By

Index Terms

Knowledge-driven Egocentric Multimodal Activity Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding
2. Theory of computation
  1. Semantics and reasoning

Recommendations

Few-shot Egocentric Multimodal Activity Recognition
MMAsia '21: Proceedings of the 3rd ACM International Conference on Multimedia in Asia

Activity recognition based on egocentric multimodal data collected by wearable devices has become increasingly popular recently. However, conventional activity recognition methods face the dilemma of the lack of large-scale labeled egocentric multimodal ...
Recognizing Camera Wearer from Hand Gestures in Egocentric Videos: https://egocentricbiometric.github.io/
MM '20: Proceedings of the 28th ACM International Conference on Multimedia

Wearable egocentric cameras are typically harnessed to a wearer's head, giving them the unique advantage of capturing their points of view. Hoshen and Peleg have shown that egocentric cameras indirectly capture the wearer's gait, which can be used to ...
Towards Continual Egocentric Activity Recognition: A Multi-Modal Egocentric Activity Dataset for Continual Learning
With the rapid development of wearable cameras, it is now feasible to considerably increase the collection of egocentric video for first-person visual perception. However, the development is hindered by a shortage of multi-modal egocentric activity ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 4

November 2020

372 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3444749

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 December 2020

Accepted: 01 July 2020

Revised: 01 June 2020

Received: 01 January 2020

Published in TOMM Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Key Research Program of Frontier Sciences of CAS
Research Program of National Laboratory of Pattern Recognition
National Natural Science Foundation of China
National Key Research and Development Program of China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
505
Total Downloads

Downloads (Last 12 months)74
Downloads (Last 6 weeks)6

Reflects downloads up to 17 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Shaikh MChai DIslam SAkhtar N(2024)From CNNs to Transformers in Multimodal Human Action Recognition: A SurveyACM Transactions on Multimedia Computing, Communications, and Applications10.1145/366481520:8(1-24)Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664815
Yang XXiong BHuang YXu C(2024)Cross-Modal Federated Human Activity RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336741246:8(5345-5361)Online publication date: Aug-2024
https://doi.org/10.1109/TPAMI.2024.3367412
Pan HZhao XHe LShi YLin X(2024)A survey of multimodal federated learning: background, applications, and perspectivesMultimedia Systems10.1007/s00530-024-01422-930:4Online publication date: 29-Jul-2024
https://dl.acm.org/doi/10.1007/s00530-024-01422-9
Zhang TMin WLiu TJiang SRui Y(2023)Toward Egocentric Compositional Action Anticipation with Adaptive Semantic DebiasingACM Transactions on Multimedia Computing, Communications, and Applications10.1145/363333320:5(1-21)Online publication date: 4-Dec-2023
https://dl.acm.org/doi/10.1145/3633333
van Rensburg BPuteaux PPuech WPedeboy J(2023)3D Object Watermarking from Data Hiding in the Homomorphic Encrypted DomainACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358857319:5s(1-20)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3588573
Xiang SQian DGuan MYan BLiu TFu YYou G(2023)Less Is More: Learning from Synthetic Data with Fine-Grained Attributes for Person Re-IdentificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358844119:5s(1-20)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3588441
Siekkinen MKämäräinen T(2023)Neural Network Assisted Depth Map Packing for Compression Using Standard Hardware Video CodecsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358844019:5s(1-20)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3588440
Wang YDong BXu KPiao HDing YYin BYang X(2023)A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural NetworksACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358793619:5s(1-17)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3587936
Li CSong LXie RZhang W(2023)Local Bidirection Recurrent Network for Efficient Video Deblurring with the Fused Temporal Merge ModuleACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358746819:5s(1-18)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3587468
Niu TChen ZLuo XZhang PHuang ZXu X(2023)Video Captioning by Learning from Global Sentence and Looking AheadACM Transactions on Multimedia Computing, Communications, and Applications10.1145/358725219:5s(1-20)Online publication date: 7-Jun-2023
https://dl.acm.org/doi/10.1145/3587252
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents