Skip to main content
Log in

Modeling multi-scale sub-group context for group activity recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Group activity recognition is a challenging task for complex motion and relation between actors. To utilize similar action of actors, this paper proposes a novel multi-scale Sub-group Context Block (SCB) for group Activity Recognition. Node embedding matrix and adjacent matrix are constructed and fed into SCB. In SCB, we use an assignment matrix to learn the mapping from actors to sub-groups, so the representation and interaction of sub-group can be learned automatically. Then Graph Convolution is used for further feature representation refine. In order to emphasize effect of different sub-groups, a reinforcement learning based module Sub-group Attention Block (SAB) is designed, which models it as a Markov decision process and gives each sub-group an importance value for further procedure. Multi-scale context for group activity in different levels is adopted by fusing features obtained with various clustering numbers. Finally, temporal information is integrated by multiple frames merging. Extensive experiments are performed on two standard group activity recognition datasets: the Volleyball and the Collective Activity. Our proposed method gets outstanding performance. The results also validate that SCB and SAB are effective for group activity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: The 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280

  2. Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: 12Th european conference on computer vision (ECCV), lecture notes in computer science, vol 7575, pp 215–23

  3. Choi W, Savarese S (2014) Understanding collective activities of people from videos. IEEE Trans Pattern Anal Mach Intell 36(6):1242–1257

    Article  Google Scholar 

  4. Lan T, Sigal L, Mori G (2012) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp 1354–1361

  5. Lan T (2012) Discriminative latent models for recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562

    Article  Google Scholar 

  6. Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision (ECCV), pp 572–585

  7. Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European Conference on Computer Vision (ECCV), pp 572–585

  8. Shu T, Xie D, Rothrock B, Todorovic S, Zhu S (2015) Joint inference of groups, events and human roles in aerial videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4576–4584

  9. Amer M R, Xie D, Zhao M, Todorovic S, Zhu S C (2012) Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: 12Th european conference on computer vision (ECCV), lecture notes in computer science, vol 7575, pp 187–200

  10. Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1971–1980

  11. Deng Z, Vahdat A, Hu H, Mori G (2016) Structure inference machines: Recurrent neural networks for analyzing relations in group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4772–4781

  12. Deng Z, Zhai M, Chen L, Liu Y, Muralidharan S, Roshtkhari MJ, Mori G (2015) Deep structured models for group activity recognition. In: Proceedings of the British Machine Vision Conference (BMVC), pp 179.1–179.12

  13. Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: 15th European Conference on Computer Vision (ECCV), Lecture Notes in Computer Science, vol11207, pp 742–758

  14. Yan R, Tang J, Shu X, Li Z, Tian Q (2018) Participation-contributed temporal dynamic model for group activity recognition.In: ACM Multimedia Conference (MM), pp 1292–1300

  15. Gavrilyuk K, Sanford R, Javan M, Snoek CGM (2020) Actor-transformers for group activity recognition. In:IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 836–845

  16. Hu G, Cui B, He Y, Yu S (2020) Progressive relation learning for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 977–986

  17. Yan R, Xie L, Tang J, Shu X, Tian Q (2020) HiGCIN: Hierarchical graph-based cross inference network for group activity recognition. IEEE Trans Pattern Anal Mach Intell PP(99):1–1

  18. Yan R, Xiel TJ, Shu X, Tian Q (2020) Social adaptive mod-ule for weakly-supervised group activity recognition. In: 16Th european conference (ECCV), vol 12353, pp 208–224

  19. Ehsanpour M, Abedin A, Saleh F, Shi J, Reid I D, Rezatofighi H (2020) Joint learning of social groups, individuals action and sub-group activities in videos. In: 16Th european conference on computer vision (ECCV), lecture notes in computer science, vol 12354, pp 177–195

  20. Azar SM, Atigh MG, Nickabadi A, Alahi A (2019) Convolutional relational machine for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7892–7901

  21. Bagautdinov TM, Alahi A, Fleuret F, Fua P, Savarese S (2017) Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3425–343

  22. Shu T, Todorovicv S, Zhu S (2017) Cern: Confidence-energy recurrent network for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp4255–4263

  23. Wang X, Girshick RB, Gupta A, He K (2018) Non-local neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7794–7803

  24. Vaswani A, Shazeer N, Parmar N, Uszkoreitv J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 5998–6008

  25. Grover A, Leskovec J (2016) node2vec: Scalable feature learning for networks. In: Proceedings of the 22nd ACM International Conference on Knowledge Discovery and Data Mining, pp 855–86

  26. Shutt K, Kindermans P, Felix HES, Chmiela S, Tkatchenko A, Muller K (2017) Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 991–100

  27. Kipf TN, Welling M (2017) Semi-supervised classification with graph convolutional networks. In: 5th International Conference on Learning Representations

  28. Hamilton WL, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. In: Advancesin Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems, pp 1024–1034

  29. Dhillon I S, Guan Y, Kulis B (2007) Weighted graph cuts without eigenvectors a multilevel approach. IEEE Transac Pattern Anal Mach Intell 29(11):1944–1957

    Article  Google Scholar 

  30. Zhang M, Cui Z, Neumann M, Chen Y (2018) An end-to-end deep learning architecture for graph classification. In: Proceedings of the 32nd Conference on Artificial Intelligence, the 30th innovative Applications of Artificial Intelligence, and the 8th Symposium on Educational Advances in Artificial Intelligence, pp 4438–444

  31. Rhee S, Seo S, Kim S (2018) Hybrid approach of relation network and localized graph convolutional filtering for breast cancer subtype classification. In: Proceedings of the 37th International Joint Conference on Artificial Intelligence (IJCAI), pp 3527–3534

  32. Vinyals O, Bengio S, Kudlur M (2016) Order matters: Sequence to sequence for sets. In: 4th International Conference on Learning Representations

  33. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE (2017) Neural message passing for quantum chemistry.In: Proceedings of the 34th International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol 70, pp 1263– 1272

  34. Duvenaud D (2015) Convolutional networks on graphs for learning molecular fingerprints. In: Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems, pp 2224–2232

  35. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. 2016 IEEE Conference on Computer Visionand Pattern Recognition (CVPR) pp 2818–2826

  36. Yeung S, Russakovsky O, Jin N, Andriluka M, Mori G, Fei-Fei L (2018) Every moment counts: Dense detailed labeling of actions in complex videos. Int J Comput Vis 126(2-4):375–389

    Article  MathSciNet  Google Scholar 

  37. Tang Y, Wang Z, Li P, Lu J, Yang M, Zhou J (2018) Mining semantics-preserving attention for group activity recognition. In: Proceedings of the 26th ACM international conference on Multimedia, pp 1283–1291

  38. Tang J, Shu X, Yan R, Zhang L (2019) Coherence constrained graph lstm for group activity recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence

  39. Kingma DP, Ba J (2015) Adam: A method for stochastic optimization. In: 3rd International Conference on Learning Representations

  40. He K, Gkioxari G (2017) Mask r-cnn. In: IEEE International Conference on Computer Vision (ICCV), pp 2980–2988

  41. Choi W, Shahid K, Savarese S (2009) What are they doing? :Collective activity classification using spatio-temporal relationship among people. In: 12th IEEE International Conference on Computer Vision (ECCV) Workshops, pp 1282–1289

  42. Qi M, Qin J, Li A, Wang Y, Luo J, Gool LV (2018) Stagnet:an attentive semantic rnn for group activity recognition. In: 15Th european conference on computer vision (ECCV), lecture notes in computer science, vol 11214, pp 104–120

  43. Wang L, Wang L, Guo J, Wu G (2019) Learning actor relation graphs for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 9964–9974

  44. Wang M, Ni B, Yang X (2017) Recurrent modeling of interaction context for collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7408– 7416

  45. Kim P, Lee D, Lee S (2018) Discriminative context learning with gated recur-rent unit for group activity recognition. Pattern Recognit 76:149–161

  46. Raptis M, Kokkinos I, Soatto S (2012) Discovering discriminative action parts from mid-level video representations. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1242–1249

  47. Ying Z, You J, Morris C, Ren X, Hamilton WL, Leskovec J (2018) Hierarchical graph representation learning with differentiable pooling. In: Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, pp 4805–4815

  48. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7132–7141

  49. Mnih V, Badia A P, Mirza M, Graves A, Lillicrap TP, HarleyvT, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33nd International Conference on Machine Learning (ICML), vol 48, pp 1928–1937

  50. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, vol 27. Annual Conference on Neural In-formation Processing Systems, pp 568–576

  51. Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 677–691

  52. Lu L, Yu R, Di H, Zhang L, Lu Y (2019) Gaim: Graph attention based interaction model for collective activity recognition. IEEE Trans Multimed PP(99):1–1

  53. Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2016) A deep structured model with radius-margin bound for 3d human activity recognition. Int J Comput Vis 118(2):256–273

    Article  MathSciNet  Google Scholar 

  54. Jing S, Chen C L, Kai K, Wang X (2017) Crowded scene understanding by deeply learned volumetric slices. IEEE Trans Circ Syst Video Technol 27(3):1–1

    Google Scholar 

  55. Shao J, Kang K, Loy CC, Wang X (2015) Deeply learned attributes for crowded scene understanding. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4657–4666

  56. Sun L, Ai H, Lao S (2016) Localizing activity groups in videos. Comput VisImage Underst 144:144–154

    Article  Google Scholar 

  57. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X (2016) GoolLV Temporal segment networks: Towards good practices for deep action recognition. In: 14Th european conference on computer vision (ECCV), lecture notes in computer science, vol 9912, pp 20–36

  58. Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4510–452

  59. Jin Y, Zhang Y, Cen Y, Li Y, Mladenovic V, Voronin V V (2021) Pedestrian detection with super-resolution reconstruction for low-quality image. Pattern Recognit 115:107846

    Article  Google Scholar 

  60. Dong W, Zhang Z, Tan T (2019) Attention-aware sampling via deep reinforcement learning for action recognition. In: The thirty-third AAAI conference on artificial inteligence, AAAI, vol 2019, pp 8247–8254

  61. Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: 2018 IEEE Conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, pp 5323–5332

  62. Zheng Y, Liu Z, Lu T, Wang L (2020) Dynamic sampling networks for efficient action recognition in videos. IEEE Trans Image Process 29:7970–7983

    Article  MATH  Google Scholar 

Download references

Funding

This research was funded by National Key Research and Development Project (No.2019YFB1405803).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peiyang Jin.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mao, K., Jin, P., Ping, Y. et al. Modeling multi-scale sub-group context for group activity recognition. Appl Intell 53, 1149–1161 (2023). https://doi.org/10.1007/s10489-022-03470-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03470-y

Keywords

Navigation