research-article

MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition

Authors:

Guo-Sen XieAuthors Info & Claims

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Pages 7666 - 7675

https://doi.org/10.1145/3581783.3612435

Published: 27 October 2023 Publication History

Abstract

Panoramic activity recognition is required to jointly identify multi-granularity human behaviors including individual actions, group activities, and global activities in multi-person videos. Previous methods encode these behaviors hierarchically through multiple stages, which disturb the inherent co-occurrence across multi-granularity behaviors in the same scene. To this end, we propose a novel Multi-granularity Unified Perception (MUP) framework that perceives different granularity behaviors universally to explore the co-occurrence motion pattern via the same parameters in an end-to-end fashion. To be specific, the proposed framework stacks three Unified Motion Encoding (UME) blocks for modeling multiple granularity behaviors with shared parameters. UME block mines intra-relevant and cross-relevant semantics synchronously from input feature sequences via Intra-granularity Motion Embedding (IME) and Cross-granularity Motion Prototyping (CMP). In particular, IME aims to model the interactions among visual features within each granularity based on the attention mechanism. CMP aims to aggregate features across different granularities (i.e., person to group) via several learnable prototypes. Extensive experiments demonstrate that MUP outperforms the state-of-the-art methods on JRDB-PAR and has satisfactory interpretability.

References

[1]

Mohamed Rabie Amer, Peng Lei, and Sinisa Todorovic. 2014. Hirf: Hierarchical random field for collective activity recognition in videos. In European Conference on Computer Vision. 572--585.

[2]

Mohamed R Amer, Sinisa Todorovic, Alan Fern, and Song-Chun Zhu. 2013. Monte carlo tree search for scheduling activity recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1353--1360.

Digital Library

[3]

Mohamed R Amer, Dan Xie, Mingtian Zhao, Sinisa Todorovic, and Song-Chun Zhu. 2012. Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In European Conference on Computer Vision. 187--200.

Digital Library

[4]

Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE International Conference on Computer Vision. 6836--6846.

[5]

Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi, and Alexandre Alahi. 2019. Convolutional relational machine for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7892--7901.

[6]

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).

[7]

Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning, Vol. 2. 4.

[8]

Cunling Bian, Wei Feng, and Song Wang. 2022. Self-supervised representation learning for skeleton-based group activity recognition. In Proceedings of the ACM international conference on Multimedia. 5990--5998.

Digital Library

[9]

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. 213--229.

Digital Library

[10]

Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.

[11]

Bowen Cheng, Alex Schwing, and Alexander Kirillov. 2021. Per-pixel classification is not all you need for semantic segmentation. Proceedings of the International Conference on Neural Information Processing Systems 34 (2021), 17864--17875.

[12]

Wongun Choi, Khuram Shahid, and Silvio Savarese. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1282--1289.

[13]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi-aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

[14]

Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, and Hamid Rezatofighi. 2020. Joint learning of social groups, individuals action and sub-group activities in videos. In European Conference on Computer Vision. 177--195.

Digital Library

[15]

Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, and Hamid Rezatofighi. 2022. Jrdb-act: A large-scale dataset for spatio-temporal action, social group and activity detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20983--20992.

[16]

Sherif Elbishlawi, Mohamed H Abdelpakey, Agwad Eltantawy, Mohamed S Shehata, and Mostafa M Mohamed. 2020. Deep learning-based crowd scene analysis survey. Journal of Imaging 6, 9 (2020), 95.

[17]

Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE International Conference on Computer Vision. 6824--6835.

[18]

Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, and Cees GM Snoek. 2020. Actor-transformers for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 839--848.

[19]

Michael A Goodrich, Alan C Schultz, et al. 2008. Human--robot interaction: a survey. Foundations and Trends® in Human-Computer Interaction 1, 3 (2008), 203--275.

[20]

Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, and Shih-Fu Chang. 2022. Few-shot object detection with fully cross-transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5321--5330.

[21]

Ruize Han, Haomin Yan, Jiacheng Li, Songmiao Wang, Wei Feng, and Song Wang. 2022. Panoramic Human Activity Recognition. In European Conference on Computer Vision. 244--261.

[22]

Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.

[23]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[24]

Guyue Hu, Bo Cui, Yuan He, and Shan Yu. 2020. Progressive relation learning for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 980--989.

[25]

Lu Jin, Xiangbo Shu, Kai Li, Zechao Li, Guo-Jun Qi, and Jinhui Tang. 2018. Deep ordinal hashing with spatial attention. IEEE Transactions on Image Processing 28, 5 (2018), 2173--2186.

Digital Library

[26]

Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. Comput. Surveys 54, 10s (2022), 1--41.

Digital Library

[27]

Dongkeun Kim, Jinsung Lee, Minsu Cho, and Suha Kwak. 2022. Detector-free weakly supervised group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20083--20093.

[28]

Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).

[29]

Yu Kong and Yun Fu. 2022. Human action recognition and prediction: A survey. International Journal of Computer Vision 130, 5 (2022), 1366--1401.

Digital Library

[30]

Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys & Tutorials 15, 3 (2012), 1192--1209.

[31]

JDMCK Lee and K Toutanova. 2018. Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).

[32]

Boning Li, Xiangbo Shu, and Rui Yan. 2021. Storyboard relational model for group activity recognition. In Proceedings of ACM International Conference on Multimedia in Asia. 1--7.

Digital Library

[33]

Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao. 2022. Uniformer: Unifying convolution and self-attention for visual recognition. arXiv preprint arXiv:2201.09450 (2022).

[34]

Shuaicheng Li, Qianggang Cao, Lingbo Liu, Kunlin Yang, Shinan Liu, Jun Hou, and Shuai Yi. 2021. Groupformer: Group activity recognition with clustered spatial-temporal transformer. In Proceedings of the IEEE International Conference on Computer Vision. 13668--13677.

[35]

Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2021. Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526 (2021).

[36]

Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2022. Mvitv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4804--4814.

[37]

Weiyao Lin, Ming-Ting Sun, Radha Poovandran, and Zhengyou Zhang. 2008. Human activity recognition for video surveillance. In IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2737--2740.

[38]

Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2022. Video swin transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3202--3211.

[39]

Roberto Martin-Martin, Mihir Patel, Hamid Rezatofighi, Abhijeet Shenoi, Jun-Young Gwak, Eric Frankel, Amir Sadeghian, and Silvio Savarese. 2021. Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).

[40]

Mandela Patrick, Dylan Campbell, Yuki Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, and João F Henriques. 2021. Keeping your eye on the ball: Trajectory attention in video transformers. NeurIPS 34 (2021), 12493--12506.

[41]

Rizard Renanda Adhi Pramono, Yie Tarng Chen, and Wen Hsien Fang. 2020. Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In European Conference on Computer Vision. 71--90.

Digital Library

[42]

Zhen Qi, Xiangbo Shu, and Jinhui Tang. 2018. Dotanet: Two-stream match-recurrent neural networks for predicting social game result. In IEEE International Conference on Bultimedia Big Data. 1--5.

[43]

Keerthana Rangasamy, Muhammad Amir As'ari, Nur Azmina Rahmad, Nurul Fathiah Ghazali, and Saharudin Ismail. 2020. Deep learning in sport video analysis: a review. Telecommunication Computing Electronics and Control 18, 4 (2020), 1926--1933.

[44]

Isidoros Rodomagoulakis, Nikolaos Kardaris, Vassilis Pitsikalis, E Mavroudi, Athanasios Katsamanis, Antigoni Tsiami, and Petros Maragos. 2016. Multimodal human action recognition in assistive human-robot interaction. In Proceedings of the International Conference on Acoustics, Speech, & Signal Processing. 2702--2706.

Digital Library

[45]

Michael S Ryoo and JK2783696 Aggarwal. 2011. Stochastic representation and recognition of high-level group activities. International Journal of Computer Vision 93 (2011), 183--200.

Digital Library

[46]

Michael S Ryoo and Jake K Aggarwal. 2006. Recognition of composite human activities through context-free grammar based representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1709--1718.

Digital Library

[47]

Tianmin Shu, Dan Xie, Brandon Rothrock, Sinisa Todorovic, and Song Chun Zhu. 2015. Joint inference of groups, events and human roles in aerial videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4576--4584.

[48]

Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, and Jian Yang. 2019. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 1110--1118.

[49]

Gabriel Skantze. 2021. Turn-taking in conversational systems and human-robot interaction: a review. Computer Speech & Language 67 (2021), 101178.

[50]

Kaitao Song, Xiu-Shen Wei, Xiangbo Shu, Ren-Jie Song, and Jianfeng Lu. 2020. Bi-modal progressive mask attention for fine-grained recognition. IEEE Transactions on Image Processing 29 (2020), 7006--7018.

[51]

Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. 2021. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 7262--7272.

[52]

Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, and Jun Liu. 2022. Human action recognition from various data modalities: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).

[53]

Masato Tamura, Rahul Vishwakarma, and Ravigopal Vennelakanti. 2022. Hunting Group Clues with Transformers for Social Group Activity Recognition. In European Conference on Computer Vision. 19--35.

[54]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497.

Digital Library

[55]

Maria Valera and Sergio A Velastin. 2005. Intelligent distributed surveillance systems: a review. IEE Proceedings - Vision, Image and Signal Processing 152, 2 (2005), 192--204.

[56]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Proceedings of the International Conference on Neural Information Processing Systems 30.

[57]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).

[58]

Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551--3558.

Digital Library

[59]

Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision. 20--36.

[60]

Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning deep transformer models for machine translation. arXiv preprint arXiv:1906.01787 (2019).

[61]

Xiao Wang, Weirong Ye, Zhongang Qi, Xun Zhao, Guangge Wang, Ying Shan, and Hanzi Wang. 2021. Semantic-guided relation propagation network for few-shot action recognition. In Proceedings of the ACM international conference on Multimedia. 816--825.

Digital Library

[62]

Xueyang Wang, Xiya Zhang, Yinheng Zhu, Yuchen Guo, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David Brady, Qionghai Dai, et al. 2020. Panda: A gigapixel-level human-centric video dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3268--3278.

[63]

Jianchao Wu, Limin Wang, Li Wang, Jie Guo, and Gangshan Wu. 2019. Learning actor relation graphs for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9964--9974.

[64]

Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven CH Hoi, and Qianru Sun. 2021. A large-scale benchmark for food image segmentation. In Proceedings of the ACM international conference on Multimedia. 506--515.

Digital Library

[65]

Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In European Conference on Computer Vision. 305--321.

Digital Library

[66]

Wentao Xie, Guanghui Ren, and Si Liu. 2020. Video relation detection with trajectory-aware multi-modal features. In Proceedings of the ACM international conference on Multimedia. 4590--4594.

Digital Library

[67]

Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, and Xiaolong Wang. 2022. Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18134--18144.

[68]

Rui Yan, Peng Huang, Xiangbo Shu, Junhao Zhang, Yonghua Pan, and Jinhui Tang. 2022. Look less think more: Rethinking compositional action recognition. In Proceedings of the ACM international conference on Multimedia. 3666--3675.

Digital Library

[69]

Rui Yan, Xiangbo Shu, Chengcheng Yuan, Qi Tian, and Jinhui Tang. 2021. Position-aware participation-contributed temporal dynamic model for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems 33, 12 (2021), 7574--7588.

[70]

Rui Yan, Jinhui Tang, Xiangbo Shu, Zechao Li, and Qi Tian. 2018. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the ACM international conference on Multimedia. 1292--1300.

Digital Library

[71]

Rui Yan, Lingxi Xie, Xiangbo Shu, Liyan Zhang, and Jinhui Tang. 2023. Progressive Instance-Aware Feature Learning for Compositional Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).

Digital Library

[72]

Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, and Qi Tian. 2020. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).

[73]

Hangjie Yuan and Dong Ni. 2021. Learning visual context for group activity recognition. In AAAI Conference on Artificial Intelligence, Vol. 35. 3261--3269.

[74]

Hangjie Yuan, Dong Ni, and Mang Wang. 2021. Spatio-temporal dynamic inference network for group activity recognition. In Proceedings of the IEEE International Conference on Computer Vision. 7476--7485.

[75]

Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yupeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, and Shuicheng Yan. 2020. Toward accurate person-level action recognition in videos of crowed scenes. In Proceedings of the ACM international conference on Multimedia. 4694--4698.

Digital Library

[76]

Lihi Zelnik-Manor and Pietro Perona. 2004. Self-tuning spectral clustering. Pro-ceedings of the International Conference on Neural Information Processing Systems 17 (2004).

[77]

Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.

[78]

Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, and Robert X Gao. 2019. Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing 115 (2019), 213--237.

[79]

Xiaolin Zhu, Yan Zhou, Dongli Wang, Wanli Ouyang, and Rui Su. 2022. MLST-Former: multi-level spatial-temporal transformer for group activity recognition. IEEE Transactions on Circuits and Systems for Video Technology (2022).

[80]

Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R Manmatha, and Mu Li. 2020. A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567 (2020).

[81]

Nadia Zouba, François Bremond, Monique Thonnat, Alain Anfosso, Éric Pascual, Patrick Mallea, Véronique Mailland, and Olivier Guerin. 2009. Assessing computer systems for monitoring elderly people living at home. In Proceedings of IAGG World Congress of Gerontology and Geriatrics, Paris, France. 5--9.

Cited By

Liu TLam KBao BCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Label Text-aided Hierarchical Semantics Mining for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681329(8139-8148)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681329
Cao MYan RShu XDai GYao YXie GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680755(691-700)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680755
Huang JLi LQing LTang WWang PGuo LPeng Y(2024)Spatio-temporal interactive reasoning model for multi-group activity recognitionPattern Recognition10.1016/j.patcog.2024.111104(111104)Online publication date: Oct-2024
https://doi.org/10.1016/j.patcog.2024.111104
Show More Cited By

Index Terms

MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Activity recognition and understanding

Recommendations

Multi-Person Brain Activity Recognition via Comprehensive EEG Signal Analysis
MobiQuitous 2017: Proceedings of the 14th EAI International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services

An electroencephalography (EEG) based brain activity recognition is a fundamental field of study for a number of significant applications such as intention prediction, appliance control, and neurological disease diagnosis in smart home and smart ...
Multi-view representation learning for multi-view action recognition

This approach directly exploits the relationships among different action categories from different views.We bridge the gap of the sparsity representation of different actions from the different views.This approach explores the task of cross-view ...
Efficient action recognition via local position offset of 3D skeletal body joints

To accurately recognize human actions in less computational time is one important aspect for practical usage. This paper presents an efficient framework for recognizing actions by a RGB-D camera. The novel action patterns in the framework are extracted ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '23: Proceedings of the 31st ACM International Conference on Multimedia

October 2023

9913 pages

ISBN:9798400701085

DOI:10.1145/3581783

General Chairs:
Abdulmotaleb El Saddik
University of Ottawa, Canada & MBZUAI, UAE
,
Tao Mei
HiDream.ai, China
,
Rita Cucchiara
University of Modena and Reggio Emilia, Italy
,
Program Chairs:
Marco Bertini
University of Florence, Italy
,
Diana Patricia Tobon Vallejo
Unversidad de Medellin, Colombia
,
Pradeep K. Atrey
University at Albany, State University of New York, USA
,
M. Shamim Hossain
M. Shamim Hossain (King Saud University, KSA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

the National Key R&D Program of China
the China Postdoctoral Science Foundation
the Natural Science Foundation of Jiangsu Province
the National Natural Science Foundation of China

Conference

MM '23

Sponsor:

SIGMM

MM '23: The 31st ACM International Conference on Multimedia

October 29 - November 3, 2023

Ottawa ON, Canada

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
270
Total Downloads

Downloads (Last 12 months)129
Downloads (Last 6 weeks)13

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu TLam KBao BCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Label Text-aided Hierarchical Semantics Mining for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681329(8139-8148)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681329
Cao MYan RShu XDai GYao YXie GCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680755(691-700)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3680755
Huang JLi LQing LTang WWang PGuo LPeng Y(2024)Spatio-temporal interactive reasoning model for multi-group activity recognitionPattern Recognition10.1016/j.patcog.2024.111104(111104)Online publication date: Oct-2024
https://doi.org/10.1016/j.patcog.2024.111104
Lee SWang YWoo SKim C(2024)Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity RecognitionComputer Vision – ECCV 202410.1007/978-3-031-73242-3_2(19-36)Online publication date: 29-Oct-2024
https://doi.org/10.1007/978-3-031-73242-3_2

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten