Skip to main content
Log in

Automatic group activity annotation for mobile videos

  • Special Issue Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Due to the rapid growth of modern mobile devices, users can capture a variety of videos at anytime and anywhere. The explosive growth of mobile videos brings about the difficulty and challenge on categorization and management. In this paper, we propose a novel approach to annotate group activities for mobile videos, which helps tag each person with an activity label, thus helping users efficiently manage the uploaded videos. To extract rich context information, we jointly model three co-existing cues including the activity duration time, individual action feature and the context information shared between person interactions. Then these appearances and context cues are modeled with a structure learning framework, which can be solved by inference with a greedy forward search. Moreover, we can infer group activity labels of all the persons together with their activity durations, especially for the situation with multiple group activities co-existing. Experimental results on mobile video dataset show that the proposed approach achieves outstanding results for group activity classification and annotation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Amer, M.R., Xie, D., Zhao, M., Todorovic, S., Zhu, S.C.: Cost-sensitive top-down/bottom-up inference for multiscale activity recognition. In: ECCV (2012)

  2. Antic, B., Ommer, B.: Learning latent constituents for recognition of group activities in video. In: European Conference on Computer Vision (ECCV) (2014)

  3. Berndt, D.J., Clifford, J.: Using dynamic time warping to find patterns in time series. KDD workshop, vol. 10, pp. 359–370 (1994)

  4. Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Tenth IEEE International Conference on Computer Vision (ICCV), vol. 2, pp. 1395–1402 (2005)

  5. Chang, X., Zheng, W.-S., Zhang, J.: Learning person-person interaction in collective activity recognition. IEEE Trans. Image Process. 24(6), 1906–1918 (2015)

    Article  MathSciNet  Google Scholar 

  6. Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: European Conference on Computer Vision (ECCV) (2012)

  7. Choi, W., Shahid, K., Savarese, S.: What are they doing? Collective activity classification using spatio–temporal relationship among people. In: IEEE 12th International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1282–1289 (2009)

  8. Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3273–3280 (2011)

  9. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893 (2005)

  10. Desai, C., Ramanan, D., Fowlkes, C.C.: Discriminative models for multi-class object layout. Int. J. Comput. Vis. 95(1), 1–12 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  11. Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)

    Article  Google Scholar 

  12. Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2012–2019 (2009)

  13. Han, D., Bo, L., Sminchisescu, C.: Selection and context for action recognition. In: IEEE International Conference on Computer Vision (ICCV), vol. 9, pp. 1933–1940 (2009)

  14. Han, Y., Wu, F., Lu, X., Tian, Q., Zhuang, Y., Luo, J.: Correlated attribute transfer with multi-task graph-guided fusion. In: Proceedings of the 20th ACM international conference on Multimedia, ACM, pp. 529–538 (2012)

  15. Han, Y., Wei, X., Cao, X., Yang, Y., Zhou, X.: Augmenting image descriptions using structured prediction output. IEEE Trans. Multimed. 16(6), 1665–1676 (2014)

    Article  Google Scholar 

  16. Jain, A., Gupta, A., Davis. L.S.: Learning what and how of contextual models for scene labeling. In: Computer Vision—ECCV 2010. Springer, pp. 199–212 (2010)

  17. Kjellström, H., Romero, J., Martínez, D., Kragić, D.: Simultaneous visual recognition of manipulation actions and manipulated objects. In: Computer Vision–ECCV 2008. Springer, pp. 336–349 (2008)

  18. Lan, T., Yang, W., Wang, Y., Mori, G.: Beyond actions: Discriminative models for contextual group activities. In: Advances in Neural Information Processing Systems 23, pp. 1216–1224 (2010)

  19. Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1354–1361 (2012)

  20. Lan, T., Wang, Y., Yang, W., et al.: Discriminative latent models for recognizing contextual group activities. IEEE Trans. Pattern Anal. Mach. Intell. 34(8), 1549–1562 (2012)

    Article  Google Scholar 

  21. Li, R., Porfilio, P., Zickler, T.: Finding group interactions in social clutter. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2722–2729 (2013)

  22. Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009, IEEE, pp. 2929–2936 (2009)

  23. Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: a graphical model relating features, objects and scenes. Adv. Neural Inf. Process. Syst. 16, 1499–1506 (2003)

    Google Scholar 

  24. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: IEEE 11th international conference on Computer Vision, 2007. ICCV 2007, IEEE, pp. 1–8 (2007)

  25. Ryoo, M.S., Aggarwal, J.K.: Spatio–temporal relationship match: video structure comparison for recognition of complex human activities. In: IEEE 12th International Conference on Computer Vision, IEEE, pp. 1593–1600 (2009)

  26. Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local svm approach. In: Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, IEEE, vol. 3, pp. 32–36 (2004)

  27. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: Proceedings of the twenty-first international conference on Machine learning, ACM, p. 104 (2004)

  28. Choi, W., Savarese, S.: Understanding collective activities of people from videos. IEEE Trans. Pattern Anal. Mach. Intell. 36(6), 1242–1257 (2013)

    Google Scholar 

  29. Yao, B., Fei-Fei, L.: Grouplet: a structured image representation for recognizing human and object interactions. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 9–16 (2010)

  30. Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 17–24 (2010)

  31. Zhu, Y., Nayak, N., Roy-Chowdhury, A.: Context-aware modeling and recognition of activities in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2491–2498 (2013)

Download references

Acknowledgments

This work was supported by 863 Program 2014AA015104, and National Natural Science Foundation of China 61273034, and 61332016.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinqiao Wang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, C., Wang, J., Li, J. et al. Automatic group activity annotation for mobile videos. Multimedia Systems 23, 667–677 (2017). https://doi.org/10.1007/s00530-016-0514-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-016-0514-9

Keywords

Navigation