Automatic group activity annotation for mobile videos

Zhao, Chaoyang; Wang, Jinqiao; Li, Jianqiang; Lu, Hanqing

doi:10.1007/s00530-016-0514-9

Automatic group activity annotation for mobile videos

Special Issue Paper
Published: 20 April 2016

Volume 23, pages 667–677, (2017)
Cite this article

Multimedia Systems Aims and scope Submit manuscript

Chaoyang Zhao¹,
Jinqiao Wang¹,
Jianqiang Li² &
…
Hanqing Lu¹

246 Accesses
1 Citation
Explore all metrics

Abstract

Due to the rapid growth of modern mobile devices, users can capture a variety of videos at anytime and anywhere. The explosive growth of mobile videos brings about the difficulty and challenge on categorization and management. In this paper, we propose a novel approach to annotate group activities for mobile videos, which helps tag each person with an activity label, thus helping users efficiently manage the uploaded videos. To extract rich context information, we jointly model three co-existing cues including the activity duration time, individual action feature and the context information shared between person interactions. Then these appearances and context cues are modeled with a structure learning framework, which can be solved by inference with a greedy forward search. Moreover, we can infer group activity labels of all the persons together with their activity durations, especially for the situation with multiple group activities co-existing. Experimental results on mobile video dataset show that the proposed approach achieves outstanding results for group activity classification and annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Recognition of Human Group Activity for Video Analytics

Social Adaptive Module for Weakly-Supervised Group Activity Recognition

References

Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: ECCV (2012)
Antic, B., Ommer, B.: Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV) (2014)
Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. KDD workshop, vol. 10, pp. 359–370 (1994)
Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1395–1402 (2005)
Chang, X., Zheng, W.-S., Zhang, J.: Learning person-person interaction in collective activity recognition. IEEE Trans. Image Process. 24(6), 1906–1918 (2015)
Article MathSciNet Google Scholar
Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV) (2012)
Choi, W., Shahid, K., Savarese, S.: What are they doing? Collective activity classification using spatio–temporal relationship among people. In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1282–1289 (2009)
Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3280 (2011)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)
Desai, C., Ramanan, D., Fowlkes, C.C.: Discriminative models for multi-class object layout. Int. J. Comput. Vis. 95(1), 1–12 (2011)
Article MATH MathSciNet Google Scholar
Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)
Article Google Scholar
Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2012–2019 (2009)
Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: IEEE International Conference on Computer Vision (ICCV), vol. 9, pp. 1933–1940 (2009)
Han, Y., Wu, F., Lu, X., Tian, Q., Zhuang, Y., Luo, J.: Correlated attribute transfer with multi-task graph-guided fusion. In: Proceedings of the 20th ACM international conference on Multimedia, ACM, pp. 529–538 (2012)
Han, Y., Wei, X., Cao, X., Yang, Y., Zhou, X.: Augmenting image descriptions using structured prediction output. IEEE Trans. Multimed. 16(6), 1665–1676 (2014)
Article Google Scholar
Jain, A., Gupta, A., Davis. L.S.: Learning what and how of contextual models for scene labeling. In: Computer Vision—ECCV 2010. Springer, pp. 199–212 (2010)
Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp. 336–349 (2008)
Lan, T., Yang, W., Wang, Y., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: Advances in Neural Information Processing Systems 23, pp. 1216–1224 (2010)
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1354–1361 (2012)
Lan, T., Wang, Y., Yang, W., et al.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2012)
Article Google Scholar
Li, R., Porfilio, P., Zickler, T.: Finding group interactions in social clutter. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2722–2729 (2013)
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, pp. 2929–2936 (2009)
Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: a graphical model relating features, objects and scenes. Adv. Neural Inf. Process. Syst. 16, 1499–1506 (2003)
Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE 11th international conference on Computer Vision, 2007. ICCV 2007, IEEE, pp. 1–8 (2007)
Ryoo, M.S., Aggarwal, J.K.: Spatio–temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE 12th International Conference on Computer Vision, IEEE, pp. 1593–1600 (2009)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, IEEE, vol. 3, pp. 32–36 (2004)
Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p. 104 (2004)
Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1242–1257 (2013)
Google Scholar
Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 9–16 (2010)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 17–24 (2010)
Zhu, Y., Nayak, N., Roy-Chowdhury, A.: Context-aware modeling and recognition of activities in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2491–2498 (2013)

Download references

Acknowledgments

This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Chaoyang Zhao, Jinqiao Wang & Hanqing Lu
School of Software Engineering, Beijing University of Technology, Beijing, China
Jianqiang Li

Authors

Chaoyang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Jinqiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianqiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Hanqing Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinqiao Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhao, C., Wang, J., Li, J. et al. Automatic group activity annotation for mobile videos. Multimedia Systems 23, 667–677 (2017). https://doi.org/10.1007/s00530-016-0514-9

Download citation

Published: 20 April 2016
Issue Date: November 2017
DOI: https://doi.org/10.1007/s00530-016-0514-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Automatic group activity annotation for mobile videos

Abstract

Access this article

Similar content being viewed by others

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Recognition of Human Group Activity for Video Analytics

Social Adaptive Module for Weakly-Supervised Group Activity Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic group activity annotation for mobile videos

Abstract

Access this article

Similar content being viewed by others

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

Recognition of Human Group Activity for Video Analytics

Social Adaptive Module for Weakly-Supervised Group Activity Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation