Abstract
Collective activity classification is the task to identify activities with multiple persons participation, which often involves the context information like person relationships and person interactions. Most existing approaches assume that all individuals in a single image share the same activity label. However, in many cases, multiple activities co-exist and serve as context cues for each other in real-world scenarios. Based on this observation, in this paper, a unified discriminative learning framework of multiple context models is proposed for concurrent collective activity recognition. Firstly, both the intra-class and inter-class behaviour interactions among persons in a scenario are considered. Besides, the scenario where activities happen also provides additional context information for recognizing specific collective activities. Finally, we jointly model the multiple context cues (intra-class, inter-class and global-context) with a max-margin leaning framework. A greedy forward search method is utilized to label the activities in the testing scenes. Experimental results demonstrate the superiority of our approach in activity recognition.
Similar content being viewed by others
References
Amer MR, Xie D, Zhao M, Todorovic S, Zhu SC (2012) Cost-sensitive top-down / bottom-up inference for multiscale activity recognition. In: ECCV
Antic B, Ommer B (2014) Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV)
Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: 10th IEEE International Conference on Computer Vision, 2005. ICCV 2005, vol 2, pp 1395–1402
Chang CC, Lin CJ (2011) LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2:27:1–27:27
Choi W, Savarese S (2012) A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV)
Choi W, Shahid K, Savarese S (2009) What are they doing? : Collective activity classification using spatio-temporal relationship among people. In: 2009 IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp 1282–1289
Choi W, Shahid K, Savarese S (2011) Learning context for collective activity recognition. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3273–3280
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005. CVPR 2005, vol 1. IEEE, pp 886–893
Desai C, Ramanan D, Fowlkes CC (2011) Discriminative models for multi-class object layout. Int J Comput Vis 95(1):1–12
Fu W, Zhao C, Wang J, Liu J, Cheng J, Lu H (2015) Concurrent group activity classification with context modeling. In: Proceedings of the 7th International Conference on Internet Multimedia Computing and Service. ACM, p 9
Gupta A, Srinivasan P, Shi J, Davis LS (2009) Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2012–2019
Han D, Bo L, Sminchisescu C (2009) Selection and context for action recognition. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1933–1940
Jain A, Gupta A, Davis LS (2010) Learning what and how of contextual models for scene labeling. In: Computer Vision–ECCV 2010. Springer, pp 199–212
Kjellström H, Romero J, Martínez D, Kragić D (2008) Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp 336–349
Lan T, Yang W, Wang Y, Mori G (2010) Beyond actions: Discriminative models for contextual group activities. In: In Advances in Neural Information Processing Systems
Lan T, Sigal L, Mori G (2012a) Social roles in hierarchical models for human activity recognition. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1354–1361
Lan T, Wang Y, Mori G, Robinovitch SN (2012b) Retrieving actions in group contexts. In: Trends and Topics in Computer Vision. Springer, pp 181–194
Lan T, Wang Y, Yang W, Robinovitch S, Mori G (2012c) Discriminative latent models for recognizing contextual group activities. IEEE Transactions on Pattern Analysis and Machine Intelligence 34(8):1549–1562
Li R, Porfilio P, Zickler T (2013) Finding group interactions in social clutter. In: CVPR
Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE, pp 2929–2936
Murphy K, Torralba A, Freeman W (2003) Using the forest to see the trees: a graphical model relating features, objects and scenes. Advances in neural information processing systems 16:1499–1506
Odashima S, Shimosaka M, Kaneko T (2012) Collective activity localization with contextual spatial pyramid. In: European Conference on Computer Vision (ECCV)
Rabinovich A, Vedaldi A, Galleguillos C, Wiewiora E, Belongie S (2007) Objects in context. In: IEEE 11th international conference on Computer vision, 2007. ICCV 2007. IEEE, pp 1–8
Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In: 2009 IEEE 12th International Conference on Computer Vision. IEEE, pp 1593–1600
Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol 3. IEEE, pp 32–36
Torralba A, Murphy K, Freeman W, Rubin M (2003) Context-based vision system for place and object recognition. In: Proceedings of the 9th IEEE International Conference on Computer Vision, 2003, vol 1, pp 273–280
Tsochantaridis I, Hofmann T, Joachims T, Altun Y (2004) Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the 21st international conference on Machine learning. ACM, p 104
Wang J, Wang B, Duan L, Tian Q, Lu H (2014) Interactive ads recommendation with contextual search on product topic space. Multimedia tools and applications 70(2):799–820
Wongun C, Silvio S (2013) Understanding collective activities of people from videos
Yao B, Fei-Fei L (2010a) Grouplet: A structured image representation for recognizing human and object interactions. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 9–16
Yao B, Fei-Fei L (2010b) Modeling mutual context of object and human pose in human-object interaction activities. In: 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp 17–24
Zhao C, Fu W, Wang J, Bai X, Liu Q, Lu H (2014) Discriminative context models for collective activity recognition. In: 2014 22nd International Conference on Pattern Recognition (ICPR). IEEE, pp 648–653
Zhu Y, Nayak NM, Roy-Chowdhury AK (2013) Context-aware modeling and recognition of activities in video. CVPR
Acknowledgments
This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhao, C., Wang, J. & Lu, H. Learning discriminative context models for concurrent collective activity recognition. Multimed Tools Appl 76, 7401–7420 (2017). https://doi.org/10.1007/s11042-016-3393-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3393-3