Dual attention based spatial-temporal inference network for volleyball group activity recognition

Li, Yanshan; Liu, Yan; Yu, Rui; Zong, Hailin; Xie, Weixin

doi:10.1007/s11042-022-13867-z

Dual attention based spatial-temporal inference network for volleyball group activity recognition

Published: 07 October 2022

Volume 82, pages 15515–15533, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yanshan Li ORCID: orcid.org/0000-0002-8814-4628^1,2,3,
Yan Liu^1,2,3,
Rui Yu^1,2,3,
Hailin Zong^1,2,3 &
…
Weixin Xie^1,2

404 Accesses
4 Citations
Explore all metrics

Abstract

With the growing demand for video content analysis, sports video activity recognition has wide application prospects and commercial value, such as computer-assisted highlight extraction, tactic statistics and strategic analysis. Volleyball group activity recognition focuses on understanding the action performed by a group of players in volleyball matches. However, due to the cluttered backgrounds and the complex relationships between individuals in volleyball video, group activity recognition for sports video has become a significant and challenging issue. Therefore, we propose a dual attention based on a spatial-temporal inference network for volleyball group activity recognition. First, a dual attention model composed of spatial attention and mixture channel attention is proposed to assign attention weight dynamically to each feature and concern on the interdependencies of group members. It can improve the capacity of the model to distinguish features representation with intra-class variation by obtaining rich contextual relationships. Next, to focus on individual spatial-temporal information, an individual spatial-temporal inference network (ISTIN) is designed to capture person-group interactions for emphasizing the variability of these information. Finally, these features are fed into a recurrent neural network to capture temporal dependencies and make the classification. Experimental results show that this approach can be effective in group activity recognition, with our model improving recognition rates over baseline method on the benchmark datasets: Volleyball dataset and Collective Activity dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning group interaction for sports video understanding from a perspective of athlete

Article 18 December 2023

Modeling multi-scale sub-group context for group activity recognition

Article 25 April 2022

Learning Key Actors and Their Interactions for Group Activity Recognition

References

Amer MR, Lei P, Todorovic S (2014) Hirf: hierarchical random field for collective activity recognition in videos. In: European conference on computer vision, Springer, Cham, pp 572–585
Amer MR, Todorovic S, Fern A et al (2013) Monte carlo tree search for scheduling activity recognition. In: IEEE international conference on computer vision, pp 1353–1360
Amer MR, Todorovic S (2015) Sum product networks for activity recognition. IEEE Trans Pattern Anal Mach Intell 38(4):800–813
Article Google Scholar
Bagautdinov T, Alahi A, Fleuret F et al (2017) Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 4315–4324
Bastanfard A, Jafari S, Amirkhani D (2019) Improving tracking soccer players in shaded playfield video. In: 2019 5th Iranian conference on signal processing and intelligent systems (ICSPIS), IEEE, pp 1–8
Biswas S, Gall J (2018) Structural recurrent neural network (SRNN) for group activity analysis. In: IEEE winter conference on applications of computer vision, pp 1625–1632
Berlin SJ, John M (2020) Spiking neural network based on joint entropy of optical flow features for human action recognition. Vis Comput, 1–15
Berlin SJ, John M (2020) R-stdp based spiking neural network for human action recognition. Appl Artif Intell 34(9):656–673
Article Google Scholar
Chen HY, Lai SH (2019) Group activity recognition via computing human pose motion history and collective map from video. In: Asian Conference on Pattern Recognition, Springer, Cham, pp 705– 718
Chen S, Tan X, Wang B et al (2018) Reverse attention for salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 234–250
Choi W, Shahid K, Savarese S (2009) What are they doing?: collective activity classification using spatio-temporal relationship among people. In: IEEE conference on computer vision workshops, pp 1282–1289
Chowdhary CL, Patel PV, Kathrotia KJ, et al. (2020) Analytical study of hybrid techniques for image encryption and decryption. Sensors 20 (18):5162
Article Google Scholar
Dasgupta A, Jawahar CV, Alahari K (2021) Context aware group activity recognition. In: 2020 25th international conference on pattern recognition (ICPR), IEEE, pp 10098–10105
Deng Z, Zhai M, Chen L et al (2015) Deep structured models for group activity recognition, arXiv:1506.04191
Fan DP, Wang W, Cheng MM et al (2019) Shifting more attention to video salient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8554–8564
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: IEEE conference on computer vision and pattern recognition, pp 1933–1941
Fu J, Liu J, Tian H et al (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3146–3154
Hajimirsadeghi H, Yan W, Vahdat A, et al. (2015) Visual recognition by counting instances: a multi-instance cardinality potential kernel. In: IEEE conference on computer vision and pattern recognition, pp 2596–2605
Han M, Zhang DJ, Wang Y et al (2022) Dual-AI: dual-path actor interaction learning for group activity recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 2990–2999
Hajimirsadeghi H, Yan W, Vahdat A et al (2015) Visual recognition by counting instances: a multi-instance cardinality potential kernel. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2596–2605
Hussain R, Karbhari Y, Ijaz MF et al (2021) Revise-net: exploiting reverse attention mechanism for salient object detection. Remote Sens 13(23):4941
Article Google Scholar
Hu X, Yang K, Fei L et al (2019) Acnet: attention based network to exploit complementary features for rgbd semantic segmentation. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 1440–1444
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) A hierarchical deep temporal model for group activity recognition. IEEE Comput Soc Conf Comput Vis Pattern Recognit
Ibrahim MS, Muralidharan S, Deng Z, Vahdat A, Mori G (2016) Hierarchical deep temporal models for group activity recognition. arXiv:1607.02643
Ibrahim MS, Mori G (2018) Hierarchical relational networks for group activity recognition and retrieval. In: Proceedings of the European conference on computer vision (ECCV), pp 721–736
Islam MM, Iqbal T (2020) Hamlet: a hierarchical multimodal attention-based human activity recognition algorithm. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS), IEEE, pp 10285–10292
Jianchao W, Limin W, Li W, Jie G, Gangshan W (2019) Learning actor relation graphs for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9964–9974
Lamghari S, Bilodeau GA, Saunier N (2021) A grid-based representation for human action recognition. In: 25th international conference on pattern recognition (ICPR), pp 10500–10507
Lan T, Sigal L, Mori G (2012) Social roles in hierarchical models for human activity recognition. In: IEEE conference on computer vision and pattern recognition, pp 1354–1361
Li X, Choo Chuah M (2017) Sbgar: semantics based group activity recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2876–2885
Li X, Choo Chuah M (2017) Sbgar: semantics based group activity recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2876–2885
Liu J, Wang G, Duan LY et al (2017) Skeleton-based human action recognition with global context-aware attention LSTM networks. IEEE Trans Image Process 27(4):1586–1599
Article MathSciNet MATH Google Scholar
Liu H, Shu N, Tang Q, Zhang W (2017) Computational model based on neural network of visual cortex for human action recognition. IEEE Trans Neural Netw Learn Syst 29(5):1427–40
Article MathSciNet Google Scholar
Peng X, Schmid C (2016) Multi-region two-stream r-CNN for action detection. In: European conference on computer vision, Springer, Cham, pp 744–759
Perez M, Liu J, Kot AC (2022) Skeleton-based relational reasoning for group activity analysis. Pattern Recogn 108360:122
Google Scholar
Qi M, Qin J, Li A et al (2018) stagnet: an attentive semantic rnn for group activity recognition. In: Proceedings of the european conference on computer vision (ECCV), pp 101–117
Qi M, Wang Y, Qin J et al (2020) stagNet: an attentive semantic RNN for group activity and individual action recognition. IEEE Trans Circuits Syst Video Technol 30(2):549–565
Article Google Scholar
Ramchandran A, Sangaiah AK (2020) Unsupervised deep learning system for local anomaly event detection in crowded scenes. Multimed Tools Appl 79 (47):35275–35295
Article Google Scholar
Rao Y, Lu J, Zhou J (2017) Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE international conference on computer vision, pp 3931–3940
Roy AG, Navab N, Wachinger C (2018) Concurrent spatial and channel ‘squeeze & excitation’ in fully convolutional networks. In: International conference on medical image computing and computer-assisted intervention, Springer, Cham, pp 421–429
Ryoo MS, Aggarwal JK (2011) Stochastic representation and recognition of high-level group activities: describing structural uncertainties in human activities. Int J Comput Vis 93(2):183–200
Article MathSciNet MATH Google Scholar
Salehifar H, Dehshibi MM, Bastanfard A (2011) A fast algorithm for detecting, labeling and tracking volleyball players in sport videos. In: IEEE ICSAP, pp 398–401
Salehifar H, Bastanfard A (2011) Visual tracking of athletes in volleyball sport videos. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV), p 1
Salehifar H, Bastanfard A (2011) A complete view depended volleyball video dataset under the uncontrolled conditions. In: Proceedings of the international conference on image processing, computer vision, and pattern recognition (IPCV). The steering committee of the world congress in computer science, computer engineering and applied computing (WorldComp), p 1
Schuster M, Paliwal KK (1997) Bidirectional recurrent neural networks. IEEE Trans Signal Process 45(11):2673–2681
Article Google Scholar
Shu T, Xie D, Rothrock B et al (2015) Joint inference of groups, events and human roles in aerial videos. In: IEEE conference on computer vision and pattern recognition, pp 4576–4584
Shu T, Todorovic S, Zhu SC (2017) CERN: confidence-energy recurrent network for group activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 4255–4263
Singh G, Saha S, Sapienza M et al (2017) Online real-time multiple spatiotemporal action localisation and prediction. In: IEEE international conference on computer vision, pp 3637–3646
Song S, Lan C, Xing J, Zeng W, Liu J (2017) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: AAAI, pp 4263–4270
Talukder A, Panangadan A (2014) Extreme event detection and assimilation from multimedia sources. Multimed Tools Appl 70(1):237–261
Article Google Scholar
Tamang J, Nkapkop JDD, Ijaz MF, et al. (2021) Dynamical properties of ion-acoustic waves in space plasma and its application to image encryption. IEEE Access 9:18762–18782
Article Google Scholar
Tang Y, Wang Z, Li P et al (2018) Mining semantics-preserving attention for group activity recognition. In: Proceedings of the 26th ACM international conference on multimedia, pp 1283–1291
Vaswani A, Shazeer N, Parmar N et al (2017) Attention is all you need, Advan Neural Inform Process Syst, 30
Wang Z, Shi Q, Shen C, et al. (2013) Bilinear programming for human activity recognition with unknown MRF graphs. In: IEEE conference on computer vision and pattern recognition, pp 1690–1697
Wang M, Ni B, Yang X (2017) Recurrent modeling of interaction context for collective activity recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3048–3056
Wang X, Girshick R, Gupta A et al (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wang CX, Xue H (2020) Group activity recognition based on GFU and hierarchical LSTM. Acta Electron Sin 48(8):1465–1471
Google Scholar
Woo S, Park J, Lee JY et al (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision, pp 3–19
Xie S, Sun C, Huang J et al (2018) Rethinking spatiotemporal feature learning: speed-accuracy trade-offs in video classification. In: European conference on computer vision (ECCV), pp 305–321
Xu D, Fu H, Wu L et al (2020) Group activity recognition by using effective multiple modality relation representation with temporal-spatial attention. IEEE Access 8:65689–65698
Article Google Scholar
Yang J, Ren P, Zhang D et al (2017) Neural aggregation network for video face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4362–4371
Yang S, Gao T, Wang J et al (1109) Efficient spike-driven learning with dendritic event-based processing. Front Neurosci 15(60):2021
Google Scholar
Yang S, Wang J, Deng B et al (2021) Neuromorphic context-dependent learning framework with fault-tolerant spike routing. IEEE Trans Neural Netw Learn Syst
Yang S, Deng B, Wang J et al (2019) Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons. IEEE Trans Neural Netw Learn Syst 31(1):148–162
Article Google Scholar
Yang S, Wang J, Zhang N et al (2021) Cerebellumorphic: large-scale neuromorphic model and architecture for supervised motor learning. IEEE Trans Neural Netw Learn Syst
Yan R, Tang J, Shu X et al (2018) Participation-contributed temporal dynamic model for group activity recognition. In: Proceedings of the 26th ACM international conference on multimedia, pp 1292–1300
Yuan H, Ni D (2021) Learning visual context for group activity recognition. Proc AAAI Conf Artif Intell 35(4):3261–3269
Google Scholar
Zalluhoglu C, Ikizler-Cinbis N Region based multi-stream convolutional neural networks for collective activity recognition. J Visual Commun Image Represent 2019(60):170–179

Download references

Acknowledgements

This work was partially supported by National Natural Science Foundation of China (62076165, 61871154), Natural Science Foundation of Guangdong Province (No. 2019A1515011307), Shenzhen Science and Technology Project (No. JCYJ20180507182259896) and the other project(Nos. 2020KCXTD004, WDZC20195500201).

Author information

Authors and Affiliations

ATR National Key Lab. of Defense Technology, Shenzhen University, Shenzhen, 518061, China
Yanshan Li, Yan Liu, Rui Yu, Hailin Zong & Weixin Xie
Department of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518061, China
Yanshan Li, Yan Liu, Rui Yu, Hailin Zong & Weixin Xie
China Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen, 518061, China
Yanshan Li, Yan Liu, Rui Yu & Hailin Zong

Authors

Yanshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rui Yu
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Zong
View author publications
You can also search for this author in PubMed Google Scholar
Weixin Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanshan Li.

Ethics declarations

Conflict of Interests

The authors declare that they have no confict of interest.

Additional information

Availability of data and material

Data and material are fully available without restriction.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, Y., Liu, Y., Yu, R. et al. Dual attention based spatial-temporal inference network for volleyball group activity recognition. Multimed Tools Appl 82, 15515–15533 (2023). https://doi.org/10.1007/s11042-022-13867-z

Download citation

Received: 10 November 2021
Revised: 01 August 2022
Accepted: 05 September 2022
Published: 07 October 2022
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11042-022-13867-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dual attention based spatial-temporal inference network for volleyball group activity recognition

Abstract

Access this article

Similar content being viewed by others

Learning group interaction for sports video understanding from a perspective of athlete

Modeling multi-scale sub-group context for group activity recognition

Learning Key Actors and Their Interactions for Group Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and material

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Dual attention based spatial-temporal inference network for volleyball group activity recognition

Abstract

Access this article

Similar content being viewed by others

Learning group interaction for sports video understanding from a perspective of athlete

Modeling multi-scale sub-group context for group activity recognition

Learning Key Actors and Their Interactions for Group Activity Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Availability of data and material

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation