skip to main content
10.1145/3581783.3612435acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition

Published: 27 October 2023 Publication History

Abstract

Panoramic activity recognition is required to jointly identify multi-granularity human behaviors including individual actions, group activities, and global activities in multi-person videos. Previous methods encode these behaviors hierarchically through multiple stages, which disturb the inherent co-occurrence across multi-granularity behaviors in the same scene. To this end, we propose a novel Multi-granularity Unified Perception (MUP) framework that perceives different granularity behaviors universally to explore the co-occurrence motion pattern via the same parameters in an end-to-end fashion. To be specific, the proposed framework stacks three Unified Motion Encoding (UME) blocks for modeling multiple granularity behaviors with shared parameters. UME block mines intra-relevant and cross-relevant semantics synchronously from input feature sequences via Intra-granularity Motion Embedding (IME) and Cross-granularity Motion Prototyping (CMP). In particular, IME aims to model the interactions among visual features within each granularity based on the attention mechanism. CMP aims to aggregate features across different granularities (i.e., person to group) via several learnable prototypes. Extensive experiments demonstrate that MUP outperforms the state-of-the-art methods on JRDB-PAR and has satisfactory interpretability.

References

[1]
Mohamed Rabie Amer, Peng Lei, and Sinisa Todorovic. 2014. Hirf: Hierarchical random field for collective activity recognition in videos. In European Conference on Computer Vision. 572--585.
[2]
Mohamed R Amer, Sinisa Todorovic, Alan Fern, and Song-Chun Zhu. 2013. Monte carlo tree search for scheduling activity recognition. In Proceedings of the IEEE International Conference on Computer Vision. 1353--1360.
[3]
Mohamed R Amer, Dan Xie, Mingtian Zhao, Sinisa Todorovic, and Song-Chun Zhu. 2012. Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In European Conference on Computer Vision. 187--200.
[4]
Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, and Cordelia Schmid. 2021. Vivit: A video vision transformer. In Proceedings of the IEEE International Conference on Computer Vision. 6836--6846.
[5]
Sina Mokhtarzadeh Azar, Mina Ghadimi Atigh, Ahmad Nickabadi, and Alexandre Alahi. 2019. Convolutional relational machine for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7892--7901.
[6]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. 2016. Layer normalization. arXiv preprint arXiv:1607.06450 (2016).
[7]
Gedas Bertasius, Heng Wang, and Lorenzo Torresani. 2021. Is space-time attention all you need for video understanding?. In Proceedings of the International Conference on Machine Learning, Vol. 2. 4.
[8]
Cunling Bian, Wei Feng, and Song Wang. 2022. Self-supervised representation learning for skeleton-based group activity recognition. In Proceedings of the ACM international conference on Multimedia. 5990--5998.
[9]
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-end object detection with transformers. In European Conference on Computer Vision. 213--229.
[10]
Joao Carreira and Andrew Zisserman. 2017. Quo vadis, action recognition? a new model and the kinetics dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6299--6308.
[11]
Bowen Cheng, Alex Schwing, and Alexander Kirillov. 2021. Per-pixel classification is not all you need for semantic segmentation. Proceedings of the International Conference on Neural Information Processing Systems 34 (2021), 17864--17875.
[12]
Wongun Choi, Khuram Shahid, and Silvio Savarese. 2009. What are they doing?: Collective activity classification using spatio-temporal relationship among people. In Proceedings of the IEEE International Conference on Computer Vision Workshop. 1282--1289.
[13]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xi-aohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. 2020. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).
[14]
Mahsa Ehsanpour, Alireza Abedin, Fatemeh Saleh, Javen Shi, Ian Reid, and Hamid Rezatofighi. 2020. Joint learning of social groups, individuals action and sub-group activities in videos. In European Conference on Computer Vision. 177--195.
[15]
Mahsa Ehsanpour, Fatemeh Saleh, Silvio Savarese, Ian Reid, and Hamid Rezatofighi. 2022. Jrdb-act: A large-scale dataset for spatio-temporal action, social group and activity detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20983--20992.
[16]
Sherif Elbishlawi, Mohamed H Abdelpakey, Agwad Eltantawy, Mohamed S Shehata, and Mostafa M Mohamed. 2020. Deep learning-based crowd scene analysis survey. Journal of Imaging 6, 9 (2020), 95.
[17]
Haoqi Fan, Bo Xiong, Karttikeya Mangalam, Yanghao Li, Zhicheng Yan, Jitendra Malik, and Christoph Feichtenhofer. 2021. Multiscale vision transformers. In Proceedings of the IEEE International Conference on Computer Vision. 6824--6835.
[18]
Kirill Gavrilyuk, Ryan Sanford, Mehrsan Javan, and Cees GM Snoek. 2020. Actor-transformers for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 839--848.
[19]
Michael A Goodrich, Alan C Schultz, et al. 2008. Human--robot interaction: a survey. Foundations and Trends® in Human-Computer Interaction 1, 3 (2008), 203--275.
[20]
Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, and Shih-Fu Chang. 2022. Few-shot object detection with fully cross-transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5321--5330.
[21]
Ruize Han, Haomin Yan, Jiacheng Li, Songmiao Wang, Wei Feng, and Song Wang. 2022. Panoramic Human Activity Recognition. In European Conference on Computer Vision. 244--261.
[22]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision. 2961--2969.
[23]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[24]
Guyue Hu, Bo Cui, Yuan He, and Shan Yu. 2020. Progressive relation learning for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 980--989.
[25]
Lu Jin, Xiangbo Shu, Kai Li, Zechao Li, Guo-Jun Qi, and Jinhui Tang. 2018. Deep ordinal hashing with spatial attention. IEEE Transactions on Image Processing 28, 5 (2018), 2173--2186.
[26]
Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, and Mubarak Shah. 2022. Transformers in vision: A survey. Comput. Surveys 54, 10s (2022), 1--41.
[27]
Dongkeun Kim, Jinsung Lee, Minsu Cho, and Suha Kwak. 2022. Detector-free weakly supervised group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 20083--20093.
[28]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014).
[29]
Yu Kong and Yun Fu. 2022. Human action recognition and prediction: A survey. International Journal of Computer Vision 130, 5 (2022), 1366--1401.
[30]
Oscar D Lara and Miguel A Labrador. 2012. A survey on human activity recognition using wearable sensors. IEEE Communications Surveys & Tutorials 15, 3 (2012), 1192--1209.
[31]
JDMCK Lee and K Toutanova. 2018. Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
[32]
Boning Li, Xiangbo Shu, and Rui Yan. 2021. Storyboard relational model for group activity recognition. In Proceedings of ACM International Conference on Multimedia in Asia. 1--7.
[33]
Kunchang Li, Yali Wang, Junhao Zhang, Peng Gao, Guanglu Song, Yu Liu, Hongsheng Li, and Yu Qiao. 2022. Uniformer: Unifying convolution and self-attention for visual recognition. arXiv preprint arXiv:2201.09450 (2022).
[34]
Shuaicheng Li, Qianggang Cao, Lingbo Liu, Kunlin Yang, Shinan Liu, Jun Hou, and Shuai Yi. 2021. Groupformer: Group activity recognition with clustered spatial-temporal transformer. In Proceedings of the IEEE International Conference on Computer Vision. 13668--13677.
[35]
Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2021. Improved multiscale vision transformers for classification and detection. arXiv preprint arXiv:2112.01526 (2021).
[36]
Yanghao Li, Chao-Yuan Wu, Haoqi Fan, Karttikeya Mangalam, Bo Xiong, Jitendra Malik, and Christoph Feichtenhofer. 2022. Mvitv2: Improved multiscale vision transformers for classification and detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4804--4814.
[37]
Weiyao Lin, Ming-Ting Sun, Radha Poovandran, and Zhengyou Zhang. 2008. Human activity recognition for video surveillance. In IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2737--2740.
[38]
Ze Liu, Jia Ning, Yue Cao, Yixuan Wei, Zheng Zhang, Stephen Lin, and Han Hu. 2022. Video swin transformer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3202--3211.
[39]
Roberto Martin-Martin, Mihir Patel, Hamid Rezatofighi, Abhijeet Shenoi, Jun-Young Gwak, Eric Frankel, Amir Sadeghian, and Silvio Savarese. 2021. Jrdb: A dataset and benchmark of egocentric robot visual perception of humans in built environments. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021).
[40]
Mandela Patrick, Dylan Campbell, Yuki Asano, Ishan Misra, Florian Metze, Christoph Feichtenhofer, Andrea Vedaldi, and João F Henriques. 2021. Keeping your eye on the ball: Trajectory attention in video transformers. NeurIPS 34 (2021), 12493--12506.
[41]
Rizard Renanda Adhi Pramono, Yie Tarng Chen, and Wen Hsien Fang. 2020. Empowering relational network by self-attention augmented conditional random fields for group activity recognition. In European Conference on Computer Vision. 71--90.
[42]
Zhen Qi, Xiangbo Shu, and Jinhui Tang. 2018. Dotanet: Two-stream match-recurrent neural networks for predicting social game result. In IEEE International Conference on Bultimedia Big Data. 1--5.
[43]
Keerthana Rangasamy, Muhammad Amir As'ari, Nur Azmina Rahmad, Nurul Fathiah Ghazali, and Saharudin Ismail. 2020. Deep learning in sport video analysis: a review. Telecommunication Computing Electronics and Control 18, 4 (2020), 1926--1933.
[44]
Isidoros Rodomagoulakis, Nikolaos Kardaris, Vassilis Pitsikalis, E Mavroudi, Athanasios Katsamanis, Antigoni Tsiami, and Petros Maragos. 2016. Multimodal human action recognition in assistive human-robot interaction. In Proceedings of the International Conference on Acoustics, Speech, & Signal Processing. 2702--2706.
[45]
Michael S Ryoo and JK2783696 Aggarwal. 2011. Stochastic representation and recognition of high-level group activities. International Journal of Computer Vision 93 (2011), 183--200.
[46]
Michael S Ryoo and Jake K Aggarwal. 2006. Recognition of composite human activities through context-free grammar based representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2. 1709--1718.
[47]
Tianmin Shu, Dan Xie, Brandon Rothrock, Sinisa Todorovic, and Song Chun Zhu. 2015. Joint inference of groups, events and human roles in aerial videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4576--4584.
[48]
Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Wei Liu, and Jian Yang. 2019. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 43, 3 (2019), 1110--1118.
[49]
Gabriel Skantze. 2021. Turn-taking in conversational systems and human-robot interaction: a review. Computer Speech & Language 67 (2021), 101178.
[50]
Kaitao Song, Xiu-Shen Wei, Xiangbo Shu, Ren-Jie Song, and Jianfeng Lu. 2020. Bi-modal progressive mask attention for fine-grained recognition. IEEE Transactions on Image Processing 29 (2020), 7006--7018.
[51]
Robin Strudel, Ricardo Garcia, Ivan Laptev, and Cordelia Schmid. 2021. Segmenter: Transformer for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 7262--7272.
[52]
Zehua Sun, Qiuhong Ke, Hossein Rahmani, Mohammed Bennamoun, Gang Wang, and Jun Liu. 2022. Human action recognition from various data modalities: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022).
[53]
Masato Tamura, Rahul Vishwakarma, and Ravigopal Vennelakanti. 2022. Hunting Group Clues with Transformers for Social Group Activity Recognition. In European Conference on Computer Vision. 19--35.
[54]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497.
[55]
Maria Valera and Sergio A Velastin. 2005. Intelligent distributed surveillance systems: a review. IEE Proceedings - Vision, Image and Signal Processing 152, 2 (2005), 192--204.
[56]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Proceedings of the International Conference on Neural Information Processing Systems 30.
[57]
Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks. arXiv preprint arXiv:1710.10903 (2017).
[58]
Heng Wang and Cordelia Schmid. 2013. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision. 3551--3558.
[59]
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. 2016. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision. 20--36.
[60]
Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning deep transformer models for machine translation. arXiv preprint arXiv:1906.01787 (2019).
[61]
Xiao Wang, Weirong Ye, Zhongang Qi, Xun Zhao, Guangge Wang, Ying Shan, and Hanzi Wang. 2021. Semantic-guided relation propagation network for few-shot action recognition. In Proceedings of the ACM international conference on Multimedia. 816--825.
[62]
Xueyang Wang, Xiya Zhang, Yinheng Zhu, Yuchen Guo, Xiaoyun Yuan, Liuyu Xiang, Zerun Wang, Guiguang Ding, David Brady, Qionghai Dai, et al. 2020. Panda: A gigapixel-level human-centric video dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3268--3278.
[63]
Jianchao Wu, Limin Wang, Li Wang, Jie Guo, and Gangshan Wu. 2019. Learning actor relation graphs for group activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9964--9974.
[64]
Xiongwei Wu, Xin Fu, Ying Liu, Ee-Peng Lim, Steven CH Hoi, and Qianru Sun. 2021. A large-scale benchmark for food image segmentation. In Proceedings of the ACM international conference on Multimedia. 506--515.
[65]
Saining Xie, Chen Sun, Jonathan Huang, Zhuowen Tu, and Kevin Murphy. 2018. Rethinking spatiotemporal feature learning: Speed-accuracy trade-offs in video classification. In European Conference on Computer Vision. 305--321.
[66]
Wentao Xie, Guanghui Ren, and Si Liu. 2020. Video relation detection with trajectory-aware multi-modal features. In Proceedings of the ACM international conference on Multimedia. 4590--4594.
[67]
Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, and Xiaolong Wang. 2022. Groupvit: Semantic segmentation emerges from text supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 18134--18144.
[68]
Rui Yan, Peng Huang, Xiangbo Shu, Junhao Zhang, Yonghua Pan, and Jinhui Tang. 2022. Look less think more: Rethinking compositional action recognition. In Proceedings of the ACM international conference on Multimedia. 3666--3675.
[69]
Rui Yan, Xiangbo Shu, Chengcheng Yuan, Qi Tian, and Jinhui Tang. 2021. Position-aware participation-contributed temporal dynamic model for group activity recognition. IEEE Transactions on Neural Networks and Learning Systems 33, 12 (2021), 7574--7588.
[70]
Rui Yan, Jinhui Tang, Xiangbo Shu, Zechao Li, and Qi Tian. 2018. Participation-contributed temporal dynamic model for group activity recognition. In Proceedings of the ACM international conference on Multimedia. 1292--1300.
[71]
Rui Yan, Lingxi Xie, Xiangbo Shu, Liyan Zhang, and Jinhui Tang. 2023. Progressive Instance-Aware Feature Learning for Compositional Action Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2023).
[72]
Rui Yan, Lingxi Xie, Jinhui Tang, Xiangbo Shu, and Qi Tian. 2020. HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020).
[73]
Hangjie Yuan and Dong Ni. 2021. Learning visual context for group activity recognition. In AAAI Conference on Artificial Intelligence, Vol. 35. 3261--3269.
[74]
Hangjie Yuan, Dong Ni, and Mang Wang. 2021. Spatio-temporal dynamic inference network for group activity recognition. In Proceedings of the IEEE International Conference on Computer Vision. 7476--7485.
[75]
Li Yuan, Yichen Zhou, Shuning Chang, Ziyuan Huang, Yupeng Chen, Xuecheng Nie, Tao Wang, Jiashi Feng, and Shuicheng Yan. 2020. Toward accurate person-level action recognition in videos of crowed scenes. In Proceedings of the ACM international conference on Multimedia. 4694--4698.
[76]
Lihi Zelnik-Manor and Pietro Perona. 2004. Self-tuning spectral clustering. Pro-ceedings of the International Conference on Neural Information Processing Systems 17 (2004).
[77]
Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. 2019. A comprehensive survey of vision-based human action recognition methods. Sensors 19, 5 (2019), 1005.
[78]
Rui Zhao, Ruqiang Yan, Zhenghua Chen, Kezhi Mao, Peng Wang, and Robert X Gao. 2019. Deep learning and its applications to machine health monitoring. Mechanical Systems and Signal Processing 115 (2019), 213--237.
[79]
Xiaolin Zhu, Yan Zhou, Dongli Wang, Wanli Ouyang, and Rui Su. 2022. MLST-Former: multi-level spatial-temporal transformer for group activity recognition. IEEE Transactions on Circuits and Systems for Video Technology (2022).
[80]
Yi Zhu, Xinyu Li, Chunhui Liu, Mohammadreza Zolfaghari, Yuanjun Xiong, Chongruo Wu, Zhi Zhang, Joseph Tighe, R Manmatha, and Mu Li. 2020. A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567 (2020).
[81]
Nadia Zouba, François Bremond, Monique Thonnat, Alain Anfosso, Éric Pascual, Patrick Mallea, Véronique Mailland, and Olivier Guerin. 2009. Assessing computer systems for monitoring elderly people living at home. In Proceedings of IAGG World Congress of Gerontology and Geriatrics, Paris, France. 5--9.

Cited By

View all
  • (2024)Label Text-aided Hierarchical Semantics Mining for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681329(8139-8148)Online publication date: 28-Oct-2024
  • (2024)AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680755(691-700)Online publication date: 28-Oct-2024
  • (2024)Spatio-temporal interactive reasoning model for multi-group activity recognitionPattern Recognition10.1016/j.patcog.2024.111104(111104)Online publication date: Oct-2024
  • Show More Cited By

Index Terms

  1. MUP: Multi-granularity Unified Perception for Panoramic Activity Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '23: Proceedings of the 31st ACM International Conference on Multimedia
    October 2023
    9913 pages
    ISBN:9798400701085
    DOI:10.1145/3581783
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2023

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. action recognition
    2. hierarchical learning
    3. semantic aggregation

    Qualifiers

    • Research-article

    Funding Sources

    • the National Key R&D Program of China
    • the China Postdoctoral Science Foundation
    • the Natural Science Foundation of Jiangsu Province
    • the National Natural Science Foundation of China

    Conference

    MM '23
    Sponsor:
    MM '23: The 31st ACM International Conference on Multimedia
    October 29 - November 3, 2023
    Ottawa ON, Canada

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)129
    • Downloads (Last 6 weeks)13
    Reflects downloads up to 28 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Label Text-aided Hierarchical Semantics Mining for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681329(8139-8148)Online publication date: 28-Oct-2024
    • (2024)AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity RecognitionProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3680755(691-700)Online publication date: 28-Oct-2024
    • (2024)Spatio-temporal interactive reasoning model for multi-group activity recognitionPattern Recognition10.1016/j.patcog.2024.111104(111104)Online publication date: Oct-2024
    • (2024)Spatio-Temporal Proximity-Aware Dual-Path Model for Panoramic Activity RecognitionComputer Vision – ECCV 202410.1007/978-3-031-73242-3_2(19-36)Online publication date: 29-Oct-2024

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media