Learning discriminative context models for concurrent collective activity recognition

Zhao, Chaoyang; Wang, Jinqiao; Lu, Hanqing

doi:10.1007/s11042-016-3393-3

Learning discriminative context models for concurrent collective activity recognition

Published: 08 March 2016

Volume 76, pages 7401–7420, (2017)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Chaoyang Zhao¹,
Jinqiao Wang¹ &
Hanqing Lu¹

352 Accesses
3 Citations
Explore all metrics

Abstract

Collective activity classification is the task to identify activities with multiple persons participation, which often involves the context information like person relationships and person interactions. Most existing approaches assume that all individuals in a single image share the same activity label. However, in many cases, multiple activities co-exist and serve as context cues for each other in real-world scenarios. Based on this observation, in this paper, a unified discriminative learning framework of multiple context models is proposed for concurrent collective activity recognition. Firstly, both the intra-class and inter-class behaviour interactions among persons in a scenario are considered. Besides, the scenario where activities happen also provides additional context information for recognizing specific collective activities. Finally, we jointly model the multiple context cues (intra-class, inter-class and global-context) with a max-margin leaning framework. A greedy forward search method is utilized to label the activities in the testing scenes. Experimental results demonstrate the superiority of our approach in activity recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ImageNet Large Scale Visual Recognition Challenge

Article 11 April 2015

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Article Open access 12 April 2024

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

References

Amer MR, Xie D, Zhao M, Todorovic S, Zhu SC (2012) Cost-sensitive top-down / bottom-up inference for multiscale activity recognition. In: ECCV
Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV)
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: 10th IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol 2, pp 1395–1402
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2:27:1–27:27
Article Google Scholar
Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV)
Choi W, Shahid K, Savarese S (2009) What are they doing? : Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp 1282–1289
Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893
Desai C, Ramanan D, Fowlkes CC (2011) Discriminative models for multi-class object layout. Int J Comput Vis 95(1):1–12
Article MathSciNet MATH Google Scholar
Fu W, Zhao C, Wang J, Liu J, Cheng J, Lu H (2015) Concurrent group activity classification with context modeling. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. ACM, p 9
Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2012–2019
Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1933–1940
Jain A, Gupta A, Davis LS (2010) Learning what and how of contextual models for scene labeling. In: Computer Vision–ECCV 2010. Springer, pp 199–212
Kjellström H, Romero J, Martínez D, Kragić D (2008) Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp 336–349
Lan T, Yang W, Wang Y, Mori G (2010) Beyond actions: Discriminative models for contextual group activities. In: In Advances in Neural Information Processing Systems
Lan T, Sigal L, Mori G (2012a) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1354–1361
Lan T, Wang Y, Mori G, Robinovitch SN (2012b) Retrieving actions in group contexts. In: Trends and Topics in Computer Vision. Springer, pp 181–194
Lan T, Wang Y, Yang W, Robinovitch S, Mori G (2012c) Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8):1549–1562
Article Google Scholar
Li R, Porfilio P, Zickler T (2013) Finding group interactions in social clutter. In: CVPR
Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2929–2936
Murphy K, Torralba A, Freeman W (2003) Using the forest to see the trees: a graphical model relating features, objects and scenes. Advances in neural information processing systems 16:1499–1506
Google Scholar
Odashima S, Shimosaka M, Kaneko T (2012) Collective activity localization with contextual spatial pyramid. In: European Conference on Computer Vision (ECCV)
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: IEEE 11th international conference on Computer vision, 2007. ICCV 2007. IEEE, pp 1–8
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1593–1600
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36
Torralba A, Murphy K, Freeman W, Rubin M (2003) Context-based vision system for place and object recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision, 2003, vol 1, pp 273–280
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st international conference on Machine learning. ACM, p 104
Wang J, Wang B, Duan L, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimedia tools and applications 70(2):799–820
Article Google Scholar
Wongun C, Silvio S (2013) Understanding collective activities of people from videos
Yao B, Fei-Fei L (2010a) Grouplet: A structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 9–16
Yao B, Fei-Fei L (2010b) Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 17–24
Zhao C, Fu W, Wang J, Bai X, Liu Q, Lu H (2014) Discriminative context models for collective activity recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 648–653
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware modeling and recognition of activities in video. CVPR

Download references

Acknowledgments

This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Chaoyang Zhao, Jinqiao Wang & Hanqing Lu

Authors

Chaoyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chaoyang Zhao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, C., Wang, J. & Lu, H. Learning discriminative context models for concurrent collective activity recognition. Multimed Tools Appl 76, 7401–7420 (2017). https://doi.org/10.1007/s11042-016-3393-3

Download citation

Received: 27 July 2015
Revised: 31 December 2015
Accepted: 23 February 2016
Published: 08 March 2016
Issue Date: March 2017
DOI: https://doi.org/10.1007/s11042-016-3393-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning discriminative context models for concurrent collective activity recognition

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Human activity recognition in artificial intelligence framework: a narrative review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning discriminative context models for concurrent collective activity recognition

Abstract

Access this article

Similar content being viewed by others

ImageNet Large Scale Visual Recognition Challenge

BoostTrack: boosting the similarity measure and detection confidence for improved multiple object tracking

Human activity recognition in artificial intelligence framework: a narrative review

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation