Abstract
Group activity recognition is a challenging task for complex motion and relation between actors. To utilize similar action of actors, this paper proposes a novel multi-scale Sub-group Context Block (SCB) for group Activity Recognition. Node embedding matrix and adjacent matrix are constructed and fed into SCB. In SCB, we use an assignment matrix to learn the mapping from actors to sub-groups, so the representation and interaction of sub-group can be learned automatically. Then Graph Convolution is used for further feature representation refine. In order to emphasize effect of different sub-groups, a reinforcement learning based module Sub-group Attention Block (SAB) is designed, which models it as a Markov decision process and gives each sub-group an importance value for further procedure. Multi-scale context for group activity in different levels is adopted by fusing features obtained with various clustering numbers. Finally, temporal information is integrated by multiple frames merging. Extensive experiments are performed on two standard group activity recognition datasets: the Volleyball and the Collective Activity. Our proposed method gets outstanding performance. The results also validate that SCB and SAB are effective for group activity recognition.
Similar content being viewed by others
References
Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280
Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: 12Th european conference on computer vision (ECCV), lecture notes in computer science, vol 7575, pp 215–23
Choi W, Savarese S (2014) Understanding collective activities of people from videos. IEEE Trans Pattern Anal Mach Intell 36(6):1242–1257
Lan T, Sigal L, Mori G (2012) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1354–1361
Lan T (2012) Discriminative latent models for recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562
Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision (ECCV), pp 572–585
Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision (ECCV), pp 572–585
Shu T, Xie D, Rothrock B, Todorovic S, Zhu S (2015) Joint inference of groups, events and human roles in aerial videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4576–4584
Amer M R, Xie D, Zhao M, Todorovic S, Zhu S C (2012) Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: 12Th european conference on computer vision (ECCV), lecture notes in computer science, vol 7575, pp 187–200
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1971–1980
Deng Z, Vahdat A, Hu H, Mori G (2016) Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4772–4781
Deng Z, Zhai M, Chen L, Liu Y, Muralidharan S, Roshtkhari MJ, Mori G (2015) Deep structured models for group activity recognition. In: Proceedings of the British Machine Vision Conference (BMVC), pp 179.1–179.12
Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: 15th European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol11207, pp 742–758
Yan R, Tang J, Shu X, Li Z, Tian Q (2018) Participation-contributed temporal dynamic model for group activity recognition.In: ACM Multimedia Conference (MM), pp 1292–1300
Gavrilyuk K, Sanford R, Javan M, Snoek CGM (2020) Actor-transformers for group activity recognition. In:IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 836–845
Hu G, Cui B, He Y, Yu S (2020) Progressive relation learning for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 977–986
Yan R, Xie L, Tang J, Shu X, Tian Q (2020) HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE Trans Pattern Anal Mach Intell PP(99):1–1
Yan R, Xiel TJ, Shu X, Tian Q (2020) Social adaptive mod-ule for weakly-supervised group activity recognition. In: 16Th european conference (ECCV), vol 12353, pp 208–224
Ehsanpour M, Abedin A, Saleh F, Shi J, Reid I D, Rezatofighi H (2020) Joint learning of social groups, individuals action and sub-group activities in videos. In: 16Th european conference on computer vision (ECCV), lecture notes in computer science, vol 12354, pp 177–195
Azar SM, Atigh MG, Nickabadi A, Alahi A (2019) Convolutional relational machine for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7892–7901
Bagautdinov TM, Alahi A, Fleuret F, Fua P, Savarese S (2017) Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3425–343
Shu T, Todorovicv S, Zhu S (2017) Cern: Confidence-energy recurrent network for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp4255–4263
Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803
Vaswani A, Shazeer N, Parmar N, Uszkoreitv J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 5998–6008
Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining, pp 855–86
Shutt K, Kindermans P, Felix HES, Chmiela S, Tkatchenko A, Muller K (2017) Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 991–100
Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations
Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advancesin Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 1024–1034
Dhillon I S, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Transac Pattern Anal Mach Intell 29(11):1944–1957
Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: Proceedings of the 32nd Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th Symposium on Educational Advances in Artificial Intelligence, pp 4438–444
Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 37th International Joint Conference on Artificial Intelligence (IJCAI), pp 3527–3534
Vinyals O, Bengio S, Kudlur M (2016) Order matters: Sequence to sequence for sets. In: 4th International Conference on Learning Representations
Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry.In: Proceedings of the 34th International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol 70, pp 1263– 1272
Duvenaud D (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 2224–2232
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 2818–2826
Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2018) Every moment counts: Dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2-4):375–389
Tang Y, Wang Z, Li P, Lu J, Yang M, Zhou J (2018) Mining semantics-preserving attention for group activity recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp 1283–1291
Tang J, Shu X, Yan R, Zhang L (2019) Coherence constrained graph lstm for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence
Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations
He K, Gkioxari G (2017) Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp 2980–2988
Choi W, Shahid K, Savarese S (2009) What are they doing? :Collective activity classification using spatio-temporal relationship among people. In: 12th IEEE International Conference on Computer Vision (ECCV) Workshops, pp 1282–1289
Qi M, Qin J, Li A, Wang Y, Luo J, Gool LV (2018) Stagnet:an attentive semantic rnn for group activity recognition. In: 15Th european conference on computer vision (ECCV), lecture notes in computer science, vol 11214, pp 104–120
Wang L, Wang L, Guo J, Wu G (2019) Learning actor relation graphs for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 9964–9974
Wang M, Ni B, Yang X (2017) Recurrent modeling of interaction context for collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7408– 7416
Kim P, Lee D, Lee S (2018) Discriminative context learning with gated recur-rent unit for group activity recognition. Pattern Recognit 76:149–161
Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1242–1249
Ying Z, You J, Morris C, Ren X, Hamilton WL, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, pp 4805–4815
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141
Mnih V, Badia A P, Mirza M, Graves A, Lillicrap TP, HarleyvT, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33nd International Conference on Machine Learning (ICML), vol 48, pp 1928–1937
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol 27. Annual Conference on Neural In-formation Processing Systems, pp 568–576
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 677–691
Lu L, Yu R, Di H, Zhang L, Lu Y (2019) Gaim: Graph attention based interaction model for collective activity recognition. IEEE Trans Multimed PP(99):1–1
Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2016) A deep structured model with radius-margin bound for 3d human activity recognition. Int J Comput Vis 118(2):256–273
Jing S, Chen C L, Kai K, Wang X (2017) Crowded scene understanding by deeply learned volumetric slices. IEEE Trans Circ Syst Video Technol 27(3):1–1
Shao J, Kang K, Loy CC, Wang X (2015) Deeply learned attributes for crowded scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4657–4666
Sun L, Ai H, Lao S (2016) Localizing activity groups in videos. Comput VisImage Underst 144:144–154
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X (2016) GoolLV Temporal segment networks: Towards good practices for deep action recognition. In: 14Th european conference on computer vision (ECCV), lecture notes in computer science, vol 9912, pp 20–36
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–452
Jin Y, Zhang Y, Cen Y, Li Y, Mladenovic V, Voronin V V (2021) Pedestrian detection with super-resolution reconstruction for low-quality image. Pattern Recognit 115:107846
Dong W, Zhang Z, Tan T (2019) Attention-aware sampling via deep reinforcement learning for action recognition. In: The thirty-third AAAI conference on artificial inteligence, AAAI, vol 2019, pp 8247–8254
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, pp 5323–5332
Zheng Y, Liu Z, Lu T, Wang L (2020) Dynamic sampling networks for efficient action recognition in videos. IEEE Trans Image Process 29:7970–7983
Funding
This research was funded by National Key Research and Development Project (No.2019YFB1405803).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mao, K., Jin, P., Ping, Y. et al. Modeling multi-scale sub-group context for group activity recognition. Appl Intell 53, 1149–1161 (2023). https://doi.org/10.1007/s10489-022-03470-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-022-03470-y