Skip to main content

Group Activity Recognition via Computing Human Pose Motion History and Collective Map from Video

  • Conference paper
  • First Online:
Pattern Recognition (ACPR 2019)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12047))

Included in the following conference series:

Abstract

In this paper, we propose a deep learning based approach that exploits multi-person pose estimation from an image sequence to predict individual actions as well as the collective activity for a group scene. We first apply multi-person pose estimation to extract pose information from the image sequence. Then we propose a novel representation called pose motion history (PMH), that aggregates spatio-temporal dynamics of multi-person human joints in the whole scene into a single stack of feature maps. Then, individual pose motion history stacks (Indi-PMH) are cropped from the whole scene stack and sent into a CNN model to obtain individual action predictions. Based on these individual predictions, we construct a collective map that encodes both the positions and actions of all individuals in the group scene into a feature map stack. The final group activity prediction is determined by fusing results of two classification CNNs. One takes the whole scene pose motion history stack as input, and the other takes the collective map stack as input. We evaluate the proposed approach on a challenging Volleyball dataset, and it provides very competitive performance compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: end-to-end multi-person action localization and collective activity recognition. In: CVPR (2017)

    Google Scholar 

  2. Biswas, S., Gall, J.: Structural recurrent neural network (SRNN) for group activity analysis. In: WACV (2018)

    Google Scholar 

  3. Bobick, A.F., Davis, J.W.: The recognition of human movement using temporal templates. TPAMI 23(3), 257–267 (2001)

    Article  Google Scholar 

  4. Cao, C., Zhang, Y., Zhang, C., Lu, H.: Action recognition with joints-pooled 3D deep convolutional descriptors. In: IJCAI (2016)

    Google Scholar 

  5. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR (2017)

    Google Scholar 

  6. Chéron, G., Laptev, I.: P-CNN: pose-based CNN features for action recognition. In: ICCV (2015)

    Google Scholar 

  7. Choi, W., Savarese, S.: A unified framework for multi-target tracking and collective activity recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 215–230. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33765-9_16

    Chapter  Google Scholar 

  8. Choi, W., Shahid, K., Savarese, S.: Learning context for collective activity recognition. In: CVPR (2011)

    Google Scholar 

  9. Choutas, V., Weinzaepfel, P., Revaud, J., Schmid, C.: PoTion: pose MoTion representation for action recognition. In: CVPR (2018)

    Google Scholar 

  10. Deng, Z., Vahdat, A., Hu, H., Mori, G.: Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: CVPR (2016)

    Google Scholar 

  11. Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)

    Google Scholar 

  12. Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: ICCV (2017)

    Google Scholar 

  13. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)

    Google Scholar 

  14. Hara, K., Kataoka, H., Satoh, Y.: Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and ImageNet? In: CVPR (2018)

    Google Scholar 

  15. Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 742–758. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01219-9_44

    Chapter  Google Scholar 

  16. Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: Hierarchical deep temporal models for group activity recognition. TPAMI (2016)

    Google Scholar 

  17. Iqbal, U., Garbade, M., Gall, J.: Pose for action – action for pose. In: FG (2017)

    Google Scholar 

  18. Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. TPAMI 35(1), 221–231 (2013)

    Article  Google Scholar 

  19. Johansson, G.: Visual perception of biological motion and a model for its analysis. Percept. Psychophys. 14(2), 201–211 (1973)

    Article  Google Scholar 

  20. Khamis, S., Morariu, V.I., Davis, L.S.: A flow model for joint action recognition and identity maintenance. In: CVPR (2012)

    Google Scholar 

  21. Khamis, S., Morariu, V.I., Davis, L.S.: Combining per-frame and per-track cues for multi-person action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 116–129. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33718-5_9

    Chapter  Google Scholar 

  22. Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR (2012)

    Google Scholar 

  23. Lan, T., Wang, Y., Yang, W., Robinovitch, S.N., Mori, G.: Discriminative latent models for recognizing contextual group activities. TPAMI 34(8), 1549–1562 (2012)

    Article  Google Scholar 

  24. Liu, M., Yuan, J.: Recognizing human actions as the evolution of pose estimation maps. In: CVPR (2018)

    Google Scholar 

  25. Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. In: NIPS (2017)

    Google Scholar 

  26. Yue-Hei Ng, J., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: CVPR (2015)

    Google Scholar 

  27. Qi, M., Qin, J., Li, A., Wang, Y., Luo, J., Van Gool, L.: stagNet: an attentive semantic RNN for group activity recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 104–120. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_7

    Chapter  Google Scholar 

  28. Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR (2016)

    Google Scholar 

  29. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: NIPS (2014)

    Google Scholar 

  30. Tang, Y., Wang, Z., Li, P., Lu, J., Yang, M., Zhou, J.: Mining semantics-preserving attention for group activity recognition. In: ACM MM (2018)

    Google Scholar 

  31. Tora, M.R., Chen, J., Little, J.J.: Classification of puck possession events in ice hockey. In: CVPR Workshop (2017)

    Google Scholar 

  32. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)

    Google Scholar 

  33. Tran, D., Wang, H., Torresani, L., Ray, J., Lecun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: CVPR (2018)

    Google Scholar 

  34. Veeriah, V., Zhuang, N., Qi, G.J.: Differential recurrent neural networks for action recognition. In: ICCV (2015)

    Google Scholar 

  35. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  36. Wu, D., Sharma, N., Blumenstein, M.: Recent advances in video-based human action recognition using deep learning: a review. In: IJCNN (2017)

    Google Scholar 

  37. Wu, Z., Jiang, Y.G., Wang, X., Ye, H., Xue, X.: Multi-stream multi-class fusion of deep networks for video classification. In: ACM MM (2016)

    Google Scholar 

  38. Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: ACM MM (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shang-Hong Lai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, HY., Lai, SH. (2020). Group Activity Recognition via Computing Human Pose Motion History and Collective Map from Video. In: Palaiahnakote, S., Sanniti di Baja, G., Wang, L., Yan, W. (eds) Pattern Recognition. ACPR 2019. Lecture Notes in Computer Science(), vol 12047. Springer, Cham. https://doi.org/10.1007/978-3-030-41299-9_55

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-41299-9_55

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-41298-2

  • Online ISBN: 978-3-030-41299-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics