A progressive hierarchical analysis model for collective activity recognition

Pei, Lishen; Zhao, Xuezhuan; Li, Tao; Zhang, Zheng

doi:10.1007/s00521-021-06585-4

A progressive hierarchical analysis model for collective activity recognition

S.I: Machine Learning based semantic representation and analytics for multimedia application
Published: 07 October 2021

Volume 34, pages 12415–12425, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Lishen Pei¹,
Xuezhuan Zhao ORCID: orcid.org/0000-0001-9523-9679²,
Tao Li³ &
…
Zheng Zhang³

348 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

We propose a progressive hierarchical analysis model to perceive the collective activities. Compared with previous activity recognition works, it not only recognizes the collective activities, but also perceives the location and the action category of each individual. At first, we perform the person temporal consistency detection procedure for each individual of the collective activities. A person detection network and conditional random field are used to receive the bounding box sequences of the activity participators. Then, we recognize the individual actions using the learned spatial features and the motion features based on LSTM. At last, the combination of the recognized person-level action category vector, the scene context features and the interaction Context features are used to recognize the collective activities. We evaluate the proposed approach on benchmark collective activity datasets. Extensive experiments demonstrate the effects of the progressive hierarchical analysis model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Panoramic Human Activity Recognition

Activity Group Localization by Modeling the Relations among Participants

References

Ullah A, Muhammad K, Ding W, Palade V, Haq IU, Baik SW (2021) Efficient activity recognition using lightweight cnn and ds-gru network for surveillance applications. Appl Soft Comput 103(12):1–13
Google Scholar
Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: Proceedings of the European conference on computer vision, pp 33–47
Bagautdinov T, Alahi A, Fua FFP, Savarese S (2017) Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–10
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: IEEE international conference on computer vision. pp 1395–1402
Borja-Borja LF, Azorin-Lopez J, Saval-Calvo M, Fuster-Guillo A (2020) Deep learning architecture for group activity recognition using description of local motions. In: Proceedings of the international joint conference on neural networks, pp 1–8
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics datase. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Chen B, Ting J.A, Marlin B, de Freitas N (2010) Deep learning of invariant spatio-temporal features from video. In: Workshop of neural information processing systems
Choi W, Savarese S (2014) Understanding collective activities of people from videos. IEEE Trans Pattern Anal Mach Intell 36(6):1242–1257
Article Google Scholar
Choi W, Shahid K, Savarese S (2009) What are they doing? : collective activity classification using spatio-temporal relationship among people. In: IEEE international conference on computer vision workshops, pp 1282–1289
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE conference on computer vision and pattern recognition pp 886–893
Dawn DD, Shaikh SH (2015) A comprehensive survey of human action recognition with spatio-temporal interest point (STIP) detector. The Visual Computer. https://doi.org/10.1007/s00371-015-1066-2
Deng Z, Vahdat A, Hu H, Mori G (2016) Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 4772–4781
Dixon S, Hansen R, Deneke W (2019) Probabilistic grammar induction for long term human activity parsing. In: Proceedings of the international conference on computational science and computational intelligence, pp 306–311
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalany S, Saenkoz K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–13
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691
Article Google Scholar
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1933–1941
Gavrilyuk K, Sanford R, Javan M, Snoek CGM (2020) Actor-transformers for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 836–845
Hajimirsadeghi H, Mori G (2017) Multi-instance classification by maxmargin training of cardinality-based markov networks. IEEE Trans Pattern Anal Mach Intell 39(9):1839–1852
Article Google Scholar
Hu G, Cui B, He Y, Yu S (2020) Progressive relation learning for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 977–986
Ibrahim M, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE international conference on on computer vision and pattern recognition, pp 1–10
Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the european conference on computer vision, pp 721–736
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) Hierarchical deep temporal models for group activity recognition. pp 1–7. arXiv preprint, arXiv:1607.02643
Ji S, Xu W, Yang M, Yu K (2012) 3D convolutional neural networks for human action recognition. In: IEEE international conference on machine learning, pp 3212–3220
Jia Y (2013) Caffe: An open source convolutional architecture or fast feature embedding. http://caffe.berkeleyvision.org/
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: IEEE conference on computer vision and pattern recognition
Kläser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3D gradients. In: British machine vision conference
Krizhevsky A, Sutskever I, Hinton GE (2018) Imagenet classification with deep convolutional neural networks. Communications of the ACM pp. 84–90 (2017) bibitem2018SRN Kıvrak, H., Köse, H.: Social robot navigation in human-robot interactive environments: Social force model approach. In: Proceedings of the signal processing and communications applications conference, pp 1–4
Lan T, Wang Y, Yang W, Robinovitch SN, Mori G (2012) Discriminative latent models for recognizing contextual group activities. IEEE Trans Pattern Anal Mach Intell 34(8):1549–1562
Article Google Scholar
Laptev I (2005) On space-time interest points. IEEE Int J Comput Vis 64:107–123
Article Google Scholar
Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: IEEE conference on computer vision and pattern recognition. pp 1–8
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: IEEE conference on computer vision and pattern recognition. pp 3361–3368
Li X, Chuah MC (2017) Sbgar: semantics based group activity recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2895–2904
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG (2018) Videolstm convolves, attends and flows for action recognition. Comput Vis Image Understand 166:41–50
Article Google Scholar
Liang X, Lin L, Cao L (2013) Learning latent spatio-temporal compositional model for human action recognition. ACM Multimedia, Chengdu
Book Google Scholar
Azar SM, Atigh MG, Nickabadi A, Alahi A (2020) Convolutional relational machine for group activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7884–7893
Amer MR, Lei P, Todorovic S (2014) Hirf: Hierarchical random field for collective activity recognition in videos. In: European conference on computer vision. vol 2, pp 571–585
Ni B, Yang X, Gao S (2016) Progressively parsing interactional objects for fine grained action detection. In: IEEE international conference on on computer vision and pattern recognition, pp 1–10
Pei L, Ye M, Xu P, Li T (2014) Fast multi-class action recognition by querying inverted index tables. Multimedia tools and applications. https://doi.org/10.1007/s11042-014-2207-8
Pei L, Ye M, Xu P, Zhao X, Guo G (2014) One example based action detection in hough space. Multimed Tools Appl 72(2):1751–1772
Article Google Scholar
Qi M, Wang Y, Qin J, Li A, Luo J, Gool LV (2020) stagnet: an attentive semantic rnn for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30(2):549–565
Article Google Scholar
Shu T, Todorovic S, Zhu SC (2017) Cern: Confidence-energy recurrent network for group activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 1–10
Shu T, Xie D, Rothrock B, Todorovic S, Zhu SC (2015) Joint inference of groups, events and human roles in aerial videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4576–4584
Shu X, Zhang L, Sun Y, Tang J (2020) Host-parasite: graph lstm-in-lstm for group activity recognition. IEEE Trans Neural Netw Learn Syst 99:1–12
Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: In advances in neural information processing systems, pp 568–576
Singh S, Arora C, Jawahar CV (2016) First person action recognition using deep learned descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2620–2628
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: IEEE conference on computer vision and pattern recognition. pp 1–10
Taylor GW, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: European conference on computer vision. pp 140–153
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. In: IEEE international conference on computer vision, pp 4489–4497
Wang H, Ullah M.M, Kläser A, Laptev L, Schmid C (2010) Evaluation of local spatio-temporal features for action recognition. In: British machine vision conference
Wang K, Wang X, Lin L, Wang M, Zuo W (2014) 3D human activity recognition with reconfigurable convolutional neural networks. ACM multimedia
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4305–4314
Wang M, Ni B, Yang X (2017) Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3048–3056
Xue C, Liu P, Liu W (2019) Studies on a video surveillance system designed for deep learning. In: Proceedings of the IEEE conference on imaging systems and techniques, pp 1–5
Zhang S, Benenson R, Schiele B (2015) Filtered feature channels for pedestrian detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1751–1760
Zhou I, Li K, He X, Li M (2016) A generative model for recognizing mixed group activities in still images. In: Proceedings of the twenty-fifth international joint conference on artificial intelligence. July, pp 3654–3660
Zou WY, Zhu S, Ng AY, Yu K (2012) Deep learning of invariant features via simulated fixations in video. In: IEEE conference on neural information processing systems. pp 3212–3220

Download references

Acknowledgements

This work was supported by the Research Programs of Henan Science and Technology Department (192102210097, 192102210126, 212102210160, 182102210210), the National Natural Science Foundation of China (61806073) and the Open Project Foundation of Information Technology Research Base of Civil Aviation Administration of China (NO. CAAC-ITRB-201607).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou, 450046, China
Lishen Pei
School of Intelligent Engineering, Zhengzhou University of Aeronautics, Zhengzhou, 450046, China
Xuezhuan Zhao
College of Information Engineering, HeNan Radio Television University, Zhengzhou, 450008, China
Tao Li & Zheng Zhang

Authors

Lishen Pei
View author publications
You can also search for this author in PubMed Google Scholar
Xuezhuan Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xuezhuan Zhao.

Ethics declarations

Conflict of interest

The authors have no conflicts of interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pei, L., Zhao, X., Li, T. et al. A progressive hierarchical analysis model for collective activity recognition. Neural Comput & Applic 34, 12415–12425 (2022). https://doi.org/10.1007/s00521-021-06585-4

Download citation

Received: 15 July 2021
Accepted: 21 September 2021
Published: 07 October 2021
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00521-021-06585-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A progressive hierarchical analysis model for collective activity recognition

Abstract

Access this article

Similar content being viewed by others

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Panoramic Human Activity Recognition

Activity Group Localization by Modeling the Relations among Participants

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A progressive hierarchical analysis model for collective activity recognition

Abstract

Access this article

Similar content being viewed by others

HiRF: Hierarchical Random Field for Collective Activity Recognition in Videos

Panoramic Human Activity Recognition

Activity Group Localization by Modeling the Relations among Participants

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation