Skeleton-based structured early activity prediction

Arzani, Mohammad M.; Fathy, Mahmood; Azirani, Ahmad A.; Adeli, Ehsan

doi:10.1007/s11042-020-08875-w

Skeleton-based structured early activity prediction

Published: 24 April 2020

Volume 80, pages 23023–23049, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Mohammad M. Arzani¹,
Mahmood Fathy^1,2,
Ahmad A. Azirani¹ &
…
Ehsan Adeli³

402 Accesses
5 Citations
Explore all metrics

Abstract

To communicate with people, robots and vision-based interactive systems often need to understand human activities in advance before the activity is performed completely. This early prediction of the activities will help them take proper near future steps to fulfill a realistic interactive session with humans. However, predicting activities in advance is a very challenging task, because some activities are simple while others are complex and comprised of several smaller atomic sub-activities. In this paper, we propose a method capable of early prediction of simple and complex human activities by formulating it as a structured prediction task using probabilistic graphical models (PGM). We use skeletons captured from low-cost depth sensors as high-level descriptions of the human body. Using 3D skeletons, our method will be robust to the environmental factors. Our proposed model is a fully observed PGM coupled with a clustering scheme to remove the dependency of our model to the number-of-middle-states hyperparameter. We test our method on three popular datasets: CAD-60, UT-Kinect, and Florence 3D and obtain accuracies of 97.6% , 100% and 96.11%, respectively. These datasets cover both simple and complex activities. When only half of the clip is observed, we achieve 93.33% and 96.9% accuracy on CAD-60 and UT-Kinect datasets, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Human activity recognition in artificial intelligence framework: a narrative review

Article 18 January 2022

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

References

Anirudh R, Turaga P, Su J, Srivastava A (2017) Elastic functional coding of riemannian trajectories. IEEE Trans Pattern Anal Mach Intell 39(5):922–936
Article Google Scholar
Arzani MM, Fathy M, Aghajan H, Azirani AA, Raahemifar K, Adeli E (2017) Structured prediction with short/long-range dependencies for human activity recognition from depth skeleton data. In: IROS
Arzani MM, Fathy M, Azirani AA, Adeli E (2019) Switching structured prediction for simple and complex human activity recognition. Submitted to IEEE Transactions on Cybernetics
Bouchard G, Triggs B (2004) The tradeoff between generative and discriminative classifiers. In: 16th IASC international symposium on computational statistics (COMPSTAT’04), pp 721–728
Chakraborty A, Roy-Chowdhury AK (2014) Context-aware activity forecasting. In: Asian conference on computer vision. Springer, Berlin, pp 21–36
Chatfield C (2016) The analysis of time series: an introduction. CRC Press, Boca Raton
MATH Google Scholar
Chauvet M, Hamilton JD (2006) Dating business cycle turning points. Contributions to Economic Analysis 276:1–54
Article Google Scholar
Chen W, Guo G (2015) Triviews: a general framework to use 3d depth data effectively for action recognition. J Visual Commun Image Representation 26:182–191
Article Google Scholar
Chiu H-K, Adeli E, Wang B, Huang D-A, Niebles JC (2019) Action-agnostic human pose forecasting. In: Winter conference on applications of computer vision (WACV). IEEE, pp 1423–1432
Cippitelli E, Gasparrini S, Gambi E, Spinsante S (2016) A human activity recognition system using skeleton data from rgbd sensors. Comput Intell Neurosci 2016:21
Article Google Scholar
Coppola C, Faria DR, Nunes U, Bellotto N (2016) Social activity recognition based on probabilistic merging of skeleton features with proximity priors from rgb-d data. In: 2016 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 5055–5061
Devanne M, Wannous H, Berretti S, Pala P, Daoudi M, Del Bimbo A (2015) 13-d human action recognition by shape analysis of motion trajectories on riemannian manifold. IEEE Trans Cybern 45(7):1340–1352
Article Google Scholar
Ding W, Liu K, Cheng F, Zhang J (2016) Learning hierarchical spatio-temporal pattern for human activity prediction. Journal of Visual Communication and Image Representation 35:103–111
Article Google Scholar
Dutta V, Zielinska T (2018) Predicting human actions taking into account object affordances. J Intell Robotic Sys, pp 1–17
Farha YA, Richard A, Gall J (2018) When will you do what?-anticipating temporal occurrences of activities. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5343–5352
Faria DR, Premebida C, Nunes U (2014) A probabilistic approach for human everyday activities recognition using body motion from rgb-d images. In: The 23rd IEEE international symposium on robot and human interactive communication, 2014 RO-MAN. IEEE, pp 732–737
Felsen P, Agrawal P, Malik J (2017) What will happen next? Forecasting player moves in sports videos. In: Proceedings of the IEEE international conference on computer vision, pp 3342–3351
Gaglio S, Re GL, Morana M (2015) Human activity recognition process using 3-d posture data. IEEE Transactions on Human-Machine Systems 45(5):586–597
Article Google Scholar
Gupta R, Chia AY-S, Rajan D (2013) Human activities recognition using depth images. In: Proceedings of the 21st ACM international conference on multimedia. ACM, pp 283–292
Hamilton JD (1989) A new approach to the economic analysis of nonstationary time series and the business cycle. Econometrica: Journal of the Econometric Society, pp 357–384
Han F, Reily B, Hoff W, Zhang H (2017) Space-time representation of people based on 3d skeletal data: a review. Computer Vision and Image Understanding 158:85–105
Article Google Scholar
Han J, Shao L, Xu D, Shotton J (2013) Enhanced computer vision with microsoft kinect sensor: a review. IEEE Trans Cybern 43(5):1318–1334
Article Google Scholar
Hayes B, Shah JA (2017) Interpretable models for fast activity recognition and anomaly explanation during collaborative robotics tasks. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 6586–6593
Hazan T, Urtasun R (2010) A primal-dual message-passing algorithm for approximated large scale structured prediction. In: Advances in neural information processing systems, pp 838–846
Hu N, Englebienne G, Lou Z, Krose B (2016) Learning to recognize human activities using soft labels. IEEE Transactions on Pattern Analysis and Machine Intelligence
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structural-rnn: deep learning on spatio-temporal graphs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5308– 5317
Jordan MI, Weiss Y (2002) Probabilistic inference in graphical models. Handbook of Neural Networks and Brain Theory
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley, New York
Google Scholar
Khodabandeh M, Vahdat A, Zhou G-T, Hajimirsadeghi H, Roshtkhari MJ, Mori G, Se S (2015) Discovering human interactions in videos with limited data labeling. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 9–18
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge
MATH Google Scholar
Kong Y, Fu Y (2015) Bilinear heterogeneous information machine for rgb-d action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1054– 1062
Koniusz P, Cherian A, Porikli F (2016) Tensor representations via kernel linearization for action recognition from 3d skeletons. In: European conference on computer vision. Springer, pp 37–53
Koppula HS, Saxena A (2016) Anticipating human activities using object affordances for reactive robotic response. IEEE Trans Pattern Anal Mach Intell 38 (1):14–29
Article Google Scholar
Koppula HS, Gupta R, Saxena A (2013) Learning human activities and object affordances from rgb-d videos. The International Journal of Robotics Research 32(8):951–970
Article Google Scholar
Lafferty J, McCallum A, Pereira FCN (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data
Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657
Article Google Scholar
Li M, Yan L, Wang Q (2018) Group sparse regression-based learning model for real-time depth-based human action prediction. Mathematical Problems in Engineering, 2018
Liu A-A, Su Y-T, Jia P-P, Gao Z, Hao T, Yang Z-X (2015) Multiple/single-view human action recognition via part-induced multitask structural learning. IEEE Trans Cybern 45(6):1194– 1208
Article Google Scholar
Liu J, Shahroudy A, Xu D, Chichung AK, Wang G (2017) Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Transactions on Pattern Analysis and Machine Intelligence
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal lstm with trust gates for 3d human action recognition. In: European conference on computer vision. Springer, pp 816–833
Liu Y, Willsky A (2013) Learning gaussian graphical models with observed or latent fvss. In: Advances in neural information processing systems, pp 1833–1841
Luo C, Ma C, Wang C-Y, Wang Y (2017) Learning discriminative activated simplices for action recognition. In: AAAI, pp 4211–4217
Manzi A, Dario P, Cavallo F (2017) A human activity recognition system based on dynamic clustering of skeleton data. Sensors 17(5):1100
Article Google Scholar
Mici L, Parisi GI, Wermter S (2018) Recognition and prediction of human-object interactions with a self-organizing architecture
Ng AY, Jordan MI (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes. In: Advances in neural information processing systems, pp 841–848
Ni B, Pei Y, Moulin P, Yan S (2013) Multilevel depth and image fusion for human activity detection. IEEE Trans Cybern 43(5):1383–1394
Article Google Scholar
Nowozin S, Lampert CH, et al. (2011) Structured learning and prediction in computer vision. Foundations and Trends®;, in Computer Graphics and Vision 6 (3–4):185–365
MATH Google Scholar
Parisi GI, Weber C, Wermter S (2015) Self-organizing neural integration of pose-motion features for human action recognition. Frontiers in Neurorobotics, 9
Piger J (2009) Econometrics: models of regime changes. In: Complex systems in finance and econometrics. Springer, pp 190–202
Piyathilaka L, Kodagoda S (2013) Gaussian mixture based hmm for human daily activity recognition using 3d skeleton features. In: 2013 8th IEEE conference on industrial electronics and applications (ICIEA). IEEE, pp 567–572
Qi S, Huang S, Wei P, Zhu S-C (2017) Predicting human activities using stochastic grammar. In: International conference on computer vision (ICCV). IEEE
Quattoni A, Wang S, Morency L-P, Collins M, Darrell T (2007) Hidden conditional random fields. IEEE Trans Pattern Anal Mach Intell 29(10)
Rahmani H, Bennamoun M (2017) Learning action recognition model from depth and skeleton videos. In: The IEEE international conference on computer vision (ICCV)
Raman N, Maybank SJ (2016) Non-parametric hidden conditional random fields for action classification. In: 2016 international joint conference on neural networks (IJCNN). IEEE, pp 3256–3263
Reily B, Han F, Parker LE, Zhang H (2018) Skeleton-based bio-inspired human activity prediction for real-time human–robot interaction. Autonomous Robots 42(6):1281–1298
Article Google Scholar
Rousseeuw PJ (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20:53–65
Article Google Scholar
Runsheng Y, Zhenyu S, Ma Q, Laiyun Q (2017) Predictive learning: using future representation learning variantial autoencoder for human action prediction. arXiv:1711.09265
Schwing A, Hazan T, Pollefeys M, Urtasun R (2011) Distributed message passing for large scale graphical models. In: 2011 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1833–1840
Schwing A, Hazan T, Pollefeys M, Urtasun R (2012) Efficient structured prediction with latent variables for general graphical models. In: Proceedings of the 29th international conference on machine learning ICML, pp 959–966
Schwing AG, Hazan T, Pollefeys M, Urtasun R (2012) Distributed structured prediction for big data. In: NIPS workshop on big learning
Schydlo P, Rakovic M, Jamone L, Santos-Victor J (2018) Anticipation in human-robot cooperation: a recurrent neural network approach for multiple action sequences prediction. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 1–6
Seidenari L, Varano V, Berretti S, Bimbo A, Pala P (2013) Recognizing actions from depth cameras as weakly aligned multi-part bag-of-poses. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 479–485
Shan J, Akella S (2014) 3d human action segmentation and recognition using pose kinetic energy. In: 2014 IEEE workshop on advanced robotics and its social impacts (ARSO). IEEE, pp 69–75
Shapovalova N, Vahdat A, Cannons K, Lan T, Mori G (2012) Similarity constrained latent support vector machine: an application to weakly supervised action classification. Computer Vision–ECCV 2012:55–68
Google Scholar
Shi Z, Kim T-K (2017) Learning and refining of privileged information-based rnns for action recognition from depth sequences. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3461–3470
Shotton J, Sharp T, Kipman A, Fitzgibbon A, Finocchio M, Blake A, Cook M, Moore R (2013) Real-time human pose recognition in parts from single depth images. Communications of the ACM 56(1):116–124
Article Google Scholar
Slama R, Wannous H, Daoudi M (2014) Grassmannian representation of motion depth for 3d human gesture and action recognition. In: 2014 22nd international conference on pattern recognition (ICPR). IEEE, pp 3499–3504
Sung J, Ponce C, Selman B, Saxena A (2011) Human activity detection from rgbd images. plan, activity, and intent recognition, 64
Sung J, Ponce C, Selman B, Saxena A (2012) Unstructured human activity detection from rgbd images. In: 2012 IEEE international conference on robotics and automation (ICRA). IEEE, pp 842–849
Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63(2):411–423
Article MathSciNet Google Scholar
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Tong H (1990) Non-linear time series. A Dynamical System Approach
Tong H (2012) Threshold models in non-linear time series analysis, vol 21. Springer, Berlin
Google Scholar
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3d skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Vemulapalli R, Arrate F, Chellappa R (2016) R3dg features: relative 3d geometry-based skeletal representations for human action recognition. Comput Vis Image Underst 152:155–166
Article Google Scholar
Wang C, Flynn J, Wang Y, Yuille AL (2016) Recognizing actions in 3d using action-snippets and activated simplices. In: AAAI, pp 3604–3610
Wang C, Wang Y, Yuille AL (2016) Mining 3d key-pose-motifs for action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2639–2647
Wang H, Wang L (2018) Learning content and style: joint action recognition and person identification from human skeletons. Pattern Recogn 81:23–35
Article Google Scholar
Wang J, Liu Z, Wu Y (2014) Learning actionlet ensemble for 3d human action recognition. In: Human action recognition with depth camera. Springer, Berlin, pp 11–40
Wang P, Yuan C, Hu W, Li B, Zhang Y (2016) Graph based skeleton motion representation and similarity measurement for action recognition. In: European conference on computer vision. Springer, pp 370–385
Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona PO (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Transactions on Human-Machine Systems 46(4):498–509
Article Google Scholar
Wu C, Zhang J, Savarese S, Saxena A (2015) Watch-n-patch: unsupervisedunderstanding of actions and relations. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4362–4370
Xia L, Chen C-C, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. In: 2012 IEEE computer society conference on computer vision and patter recognition workshops (CVPRW). IEEE, pp 20–27
Yang X, Tian YL (2014) Effective 3d action recognition using eigenjoints. Journal of Visual Communication and Image Representation 25(1):2–11
Article Google Scholar
Ye J, Li K, Qi G-J, Hua KA (2015) Temporal order-preserving dynamic quantization for human action recognition from multimodal sensor streams. In: Proceedings of the 5th ACM on international conference on multimedia retrieval. ACM, pp 99–106
Yu C-NJ, Joachims T (2009) Learning structural svms with latent variables. In: Proceedings of the 26th annual international conference on machine learning. ACM, pp 1169–1176
Zhang C, Tian Y (2012) Rgb-d camera-based daily living activity recognition. Journal of Computer Vision and Image Processing 2(4):12
Google Scholar
Zhang J, Li W, Ogunbona PO, Wang P, Tang C (2016) Rgb-d-based action recognition datasets: a survey. Pattern Recognition 60:86–105
Article Google Scholar
Zhang X, Wang Y, Gou M, Sznaier M, Camps O (2016) Efficient temporal sequence comparison and classification using gram matrix embeddings on a riemannian manifold. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4498–4507
Zhu G, Zhang L, Shen P, Song J (2016) Human action recognition using multi-layer codebooks of key poses and atomic motions. Signal Process Image Commun 42:19–30
Article Google Scholar
Zhu Y, Chen W, Guo G (2014) Evaluating spatiotemporal interest point features for depth-based action recognition. Image Vis Comput 32(8):453–464
Article Google Scholar
Zhu Y, Chen W, Guo G (2015) Fusing multiple features for depth-based action recognition. ACM Transactions on Intelligent Systems and Technology (TIST) 6(2):18
Google Scholar

Download references

Author information

Authors and Affiliations

Iran University of Science and Technology, Tehran, Iran
Mohammad M. Arzani, Mahmood Fathy & Ahmad A. Azirani
School of Computer Science, Institute for Research in Fundamental Sciences (IPM), P.O. Box 19395-5746, Tehran, Iran
Mahmood Fathy
Stanford University, Stanford, CA, 94305, USA
Ehsan Adeli

Authors

Mohammad M. Arzani
View author publications
You can also search for this author in PubMed Google Scholar
Mahmood Fathy
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad A. Azirani
View author publications
You can also search for this author in PubMed Google Scholar
Ehsan Adeli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmood Fathy.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arzani, M.M., Fathy, M., Azirani, A.A. et al. Skeleton-based structured early activity prediction. Multimed Tools Appl 80, 23023–23049 (2021). https://doi.org/10.1007/s11042-020-08875-w

Download citation

Received: 30 March 2019
Revised: 11 February 2020
Accepted: 19 March 2020
Published: 24 April 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11042-020-08875-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Skeleton-based structured early activity prediction

Abstract

Access this article

Similar content being viewed by others

Human activity recognition in artificial intelligence framework: a narrative review

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Skeleton-based structured early activity prediction

Abstract

Access this article

Similar content being viewed by others

Human activity recognition in artificial intelligence framework: a narrative review

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation