A Hierarchical Video Description for Complex Activity Understanding

Liu, Cuiwei; Wu, Xinxiao; Jia, Yunde

doi:10.1007/s11263-016-0897-2

A Hierarchical Video Description for Complex Activity Understanding

Published: 22 March 2016

Volume 118, pages 240–255, (2016)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

Cuiwei Liu¹,
Xinxiao Wu¹ &
Yunde Jia¹

1328 Accesses
13 Citations
Explore all metrics

Abstract

This paper addresses the challenging problem of complex human activity understanding from long videos. Towards this goal, we propose a hierarchical description of an activity video, referring to the “which” of activities, “what” of atomic actions, and “when” of atomic actions happening in the video. In our work, each complex activity is characterized as a composition of simple motion units (called atomic actions), and different atomic actions are explained by different video segments. We develop a latent discriminative structural model to detect the complex activity and atomic actions, while learning the temporal structure of atomic actions simultaneously. A segment-annotation mapping matrix is introduced for relating video segments to their associational atomic actions, allowing different video segments to explain different atomic actions. The segment-annotation mapping matrix is treated as latent information in the model, since its ground-truth is not available during both training and testing. Moreover, we present a semi-supervised learning method to automatically predict the atomic action labels of unlabeled training videos when the labeled training data is limited, which could greatly alleviate the laborious and time-consuming annotations of atomic actions for training data. Experiments on three activity datasets demonstrate that our method is able to achieve promising activity recognition results and obtain rich and hierarchical descriptions of activity videos.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A discriminative structural model for joint segmentation and recognition of human actions

Article 09 June 2018

Cuiwei Liu, Jingyi Hou, … Yunde Jia

Semantic Sequence Analysis for Human Activity Prediction

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

References

Bhattacharya, S., Kalayeh, M. M., Sukthankar, R. & Shah, M. (2014). Recognition of complex events: Exploiting temporal dynamics between underlying concepts. In IEEE international conference on computer vision and pattern recognition (CVPR).
Do, T. M. T. & Artieres, T. (2009). Large margin training for hidden markov models with partially observed states. In IEEE international conference on machine learning (ICML).
Dollar, P., Rabaud, V., Cottrell, G. & Belongie, S. (2005). Behavior recognition via sparse spatio-temporal features. In VS PETS.
Efros, A.A., Berg, A.C., Mori, G. & Malik, J. (2003). Recognizing action at a distance. In IEEE international conference on computer vision (ICCV).
Felzenszwalb, P. F., Girshick, R. B., McAllester, D. A., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 32(9), 1627–1645.
Article Google Scholar
Gaidon, A., Harchaoui, Z. & Schmid, C. (2011). Actom sequence models for efficient action detection. In IEEE international conference on computer vision and pattern recognition (CVPR).
Gaidon, A., Harchaoui, Z., & Schmid, C. (2014). Activity representation with motion hierarchies. International Journal of Computer Vision (IJCV), 107(3), 219–238.
Article MathSciNet Google Scholar
Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 29(12), 2247–2253.
Article Google Scholar
Hoai, M., Lan. Z. & Torre, F. (2011). Joint segmentation and classification of human actions in video. In IEEE international conference on computer vision and pattern recognition (CVPR).
Hu, N., Englebienne, G., Lou. Z. & Krose, B. (2014). Learning latent structure for activity recognition. In IEEE international conference on robotics and automation (ICRA).
Izadinia, H. & Shah, M. (2012). Recognizing complex events using large margin joint low-level event model. In European conference on computer vision (ECCV).
Jiang, Y., Dai, Q., Xue, X., Liu, W. & Ngo, C. W. (2012). Trajectory-based modeling of human actions with motion reference points. In European conference on computer vision (ECCV).
Kliper, O., Gurovich, Y., Hassner, T., Wolf, L. (2012). Motion interchange patterns for action recognition in unconstrained videos. In European conference on computer vision (ECCV).
Laptev, I. (2005). On space-time interest points. International Journal of Computer Vision (IJCV), 64, 117–123.
Article Google Scholar
Laxton, B., Lim, J. & Kriegman, D. (2007). Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In IEEE conference on computer vision and pattern recognition (CVPR).
Le, Q., Zou, W., Yeung, S. & Ng, A. (2011). Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In IEEE international conference on computer vision and pattern recognition (CVPR).
Li, W. & Vasconcelos, N. (2012). Recognizing activities by attribute dynamics. In Neural information processing systems conference (NIPS).
Li, W., Zhang, Z., & Liu, Z. (2008). Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 18(11), 1499–1510.
Article Google Scholar
Li, W., Yu, Q., Sawhney, H. & Vasconcelos, N. (2013). Recognizing activities via bag of words for attribute dynamics. In IEEE conference on computer vision and pattern recognition (CVPR).
Lillo, I., Soto, A., Niebles, J.C. (2014). Discriminative hierarchical modeling of spatio-temporally composable human activities. In IEEE conference on computer vision and pattern recognition (CVPR).
Liu, J., Kuipers, B., Savarese, S. (2011). Recognizing human actions by attributes. In IEEE international conference on computer vision and pattern recognition (CVPR).
Niebles, J., Chen, C., Li, F. (2010). Modeling temporal structure of decomposable motion segments for activity classification. In European conference on computer vision (ECCV).
Pirsiavash, H., Ramanan, D. (2014). Parsing videos of actions with segmental grammars. In IEEE international conference on computer vision and pattern recognition (CVPR).
Rodriguez, M.D., Ahmed, J. & Shah, M. (2008). Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: IEEE international conference on computer vision and pattern recognition (CVPR).
Sadanand, S. & Corso, J.J. (2012). Action bank: a high-level representation of activity in video. In IEEE international conference on computer vision and pattern recognition (CVPR).
Sontag, D., Globerson, A., & Jaakkola, T. (2011). Introduction to dual decomposition for inference. Optimization for Machine Learning, 1, 219–254.
Google Scholar
Sun, C. & Nevatia, R. (2013). Active: Activity concept transitions in video event classification. In IEEE international conference on computer vision (ICCV).
Tang, K., Li, F.F., Koller, D. (2012). Learning latent temporal structure for complex event detection. In IEEE international conference on computer vision and pattern recognition (CVPR).
Wang, H., Klaser, A., Schmid, C., & Liu, C. L. (2013a). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision (IJCV), 103(1), 60–79.
Article MathSciNet Google Scholar
Wang, L., Qiao, Y., Tang, X., et al. (2013b). Mining motion atoms and phrases for complex action recognition. In IEEE international conference on computer vision (ICCV).
Wang, L., Qiao, Y., & Tang, X. (2014). Latent hierarchical model of temporal structure for complex activity classification. IEEE Transactions on Image Processing (T-IP), 23(2), 810–822.
Article MathSciNet Google Scholar
Wang, Y. & Mori, G. (2010). A discriminative latent model of image region and object tag correspondence. In Neural information processing systems conference (NIPS).
Weinland, D., Boyer, E. & Ronfard, R. (2007) Action recognition from arbitrary views using 3d exemplars. In IEEE international conference on computer vision (ICCV).
Wu, X., Xu, D., Duan, L. & Luo, J. (2011). Action recognition using context and appearance distribution features. In IEEE international conference on computer vision and pattern recognition (CVPR)
Yilmaz, A. & Shah, M. (2005). Action sketch: a novel action representation. In IEEE international conference on computer vision and pattern recognition (CVPR).
Yu, C. N. J. & Joachims, T. (2009). Learning structural svms with latent variables. In IEEE international conference on machine learning (ICML).
Yu, G., Yuan, J. & Liu, Z. (2012). Propagative hough voting for human activity recognition. In European conference on computer vision (ECCV).
Zhou, Q. & Wang, G. (2012). Atomic action features: A new feature for action recognition. In European conference on computer vision (ECCV).
Zhou, Q., Wang, G., Jia, K. & Zhao, Q. (2013). Learning to share latent tasks for action recognition. In: IEEE international conference on computer vision (ICCV).

Download references

Acknowledgments

This work was supported in part by the Natural Science Foundation of China (NSFC) under Grant Nos. 61203274, 61375044 and 61472038.

Author information

Authors and Affiliations

Beijing Lab. of Intelligent Information Technology, School of Computer Science, Beijing Institute of Technology, Beijing, 100081, People’s Republic of China
Cuiwei Liu, Xinxiao Wu & Yunde Jia

Authors

Cuiwei Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xinxiao Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yunde Jia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinxiao Wu.

Additional information

Communicated by Junsong Yuan, Wanqing Li, Zhengyou Zhang, David Fleet, and Jamie Shotton.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, C., Wu, X. & Jia, Y. A Hierarchical Video Description for Complex Activity Understanding. Int J Comput Vis 118, 240–255 (2016). https://doi.org/10.1007/s11263-016-0897-2

Download citation

Received: 18 October 2014
Accepted: 19 February 2016
Published: 22 March 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11263-016-0897-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Hierarchical Video Description for Complex Activity Understanding

Abstract

Access this article

Similar content being viewed by others

A discriminative structural model for joint segmentation and recognition of human actions

Semantic Sequence Analysis for Human Activity Prediction

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Hierarchical Video Description for Complex Activity Understanding

Abstract

Access this article

Similar content being viewed by others

A discriminative structural model for joint segmentation and recognition of human actions

Semantic Sequence Analysis for Human Activity Prediction

Weakly-Supervised Temporal Action Detection for Fine-Grained Videos with Hierarchical Atomic Actions

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation