Abstract
Human Activity Recognition in RGB-D videos has been an active research topic during the last decade. However, only a few efforts have been made, for recognizing human activity in RGB-D videos where several performers are performing simultaneously. In this paper we introduce such a challenging dataset with several performers performing the activities simultaniously. We present a novel method for recognizing human activities performed simultaniously in the same videos. The proposed method aims in capturing the motion information of the whole video by producing a dynamic image corresponding to the input video. We use two parallel ResNet-101 architectures to produce the dynamic images for the RGB video and depth video separately. The dynamic images contain only the motion information of the whole frame, which is the main cue for analyzing the motion of the performer during action. Hence, dynamic images help recognizing human action by concentrating only on the motion information appeared on the frame. We send the two dynamic images through a fully connected layer for classification of activity. The proposed dynamic image reduces the complexity of the recognition process by extracting a sparse matrix from a video, while preserving the motion information required for activity recognition, and produces comparable results with respect to the state-of-the-art.




Similar content being viewed by others
References
Aghbolaghi MA, Bertiche H, Roig V, Kasaei S, Escalera S (2017) Action recognition from RGB-D data: comparison and fusion of spatio-temporal handcrafted features and deep strategies. In: ICCV workshops
Akula A, Shah AK, Ghosh R (2018) Deep learning approach for human action recognition in infrared images. Cognitive Systems Research. https://doi.org/10.1016/j.cogsys.2018.04.002
Baek S, Shi Z, Kawade M, Kim TK (2017) Kinematic-layout-aware random forests for depth-based action recognition BMVC
Bilen H, Fernando B, Gavves E, Vedaldi A, Gould S (2017) Action recognition with dynamic image networks. IEEE Tran PAMI, https://doi.org/10.1109/TPAMI.2017.2769085
Chen J, Zhao G, Kellokumpu VP, Pietikäinen M (2011) Combining sparse and dense descriptors with temporal semantic structures for robust human action recognition. In: ICCV, pp 1524–1531
Chen C, Jafari R, Kehtarnavaz N (2017) A survey of depth and inertial sensor fusion for human action recognition. Multimed Tools Applic 76(3):4405–4425
Chen C, Zhang B, Hou Z, Jiang J, Liu M, Yang Y (2017) Action recognition from depth sequences using weighted fusion of 2D and 3D auto-correlation of gradients features. Multimed Tools Applic 76(3):4651–4669
Fernando B, Gavves E, Oramas J, Ghodrati A, Tuytelaars T (2017) Rank pooling for action recognition. IEEE Tran PAMI 39(4):773–787
Gonzalez-Sanchez T, Puig D (2011) Real-time body gesture recognition using depth camera. Electron Lett 47(12):697–698
Guindel C, Martin Jose D, Armingol M (2019) Traffic scene awareness for intelligent vehicles using ConvNets and stereo vision. Robot Auton Syst 112:109–122
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Hu JF, Zheng WS, Pan J, Lai J, Zhang J (2018) Deep bilinear learning for RGB-D action recognition. In: ECCV, pp 1–17
Ji Y, Xu F, Yang Y, Shen F, Shen HT, Zheng WS (2019) A large-scale varying-view RGB-D action dataset for arbitrary-view human action recognition. arxiv:1904.10681
Kong Y, Fu Y (2015) Bilinear heterogeneous information machine for RGB-D action recognition. In: CVPR
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: CVPR workshops, pp 9–14
Mukherjee S, Mukherjee DP (2013) A design-of-experiment based statistical technique for detection of key-frames. Multimed Tools Applic 62(3):847–877
Maryam AA, Kasaei S (2018) Supervised spatio-temporal kernel descriptor for human action recognition from RGB-depth videos. Multimed Tools Applic 77 (11):14115–14135
Negin F, Zdemir FO, Akgul CB, Yuksel KA, Ercil A (2013) A decision forest based feature selection framework for action recognition from rgb-depth cameras. In: ICIAR
Oreifej O, Liu Z, Redmond WA (2013) HON4D: histogram of oriented 4D normals for activity recognition from depth sequences. In: CVPR
Rahmani H, Mahmood A, Huynh DQ, Mian A (2014) HOPC: histogram of oriented principal components of 3d pointclouds for action recognition. In: ECCV
Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+D: a large scale dataset for 3D human activity analysis. In: CVPR, pp 1010–1019
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Smola AJ, Scholkopf B (2004) A tutorial on support vector regression. Stat Comput 14:199–222
Spinello L, Arras KO (2011) People detection in rgb-d data. In: IROS
Wang J, Liu Z, Wu Y, Yuan J (2012) Mining action let ensemble for action recognition with depth cameras. CVPR, 1290–1297
Wang P, Li W, Gao Z, Zhang J, Tang C, Ogunbona PO (2016) Action recognition from depth maps using deep convolutional neural networks. IEEE Tran HMS, 46(4)
Wang P, Wang S, Gao Z, Hou Y, Li W (2017) Structured images for RGB-D action recognition. In: ICCV workshops, pp 1005–1014
Wang P, Li W, Ogunbona P, Wan J, Escalera S (2018) RGB-D-based human motion recognition with deep learning: a survey. arXiv:1711.08362v2 [cs.CV]
Wang P, Li W, Wan J, Ogunbona P, Liu X (2018) Cooperative training of deep aggregation networks for RGB-D action recognition. In: AAAI, pp 7404–7411
Wilson G, Pereyda C, Raghunath N, de la Cruz G, Goel S, Nesaei S, Minor B, Edgecombe MS, Taylor ME, Cook DJ (2018) Robot-enabled support of daily activities in smart home environments. Cognitive Systems Research, https://doi.org/10.1016/j.cogsys.2018.10.032
Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995
Yang X, Tian Y (2012) EigenJoints-based action recognition using Native-Bayes-Nearest-Neighbor. In: CVPR workshops, pp 14–19
Yang X, Zhang C, Tian Y (2012) Recognizing actions using depth motion maps-based histograms of oriented gradients. In: ACM multimedia, pp 1057–1060
Zhang J, Li W, Ogunbona P, Wang P, Tang C (2016) RGB-D-based action recognition datasets: a survey. Pattern Recogn 60(2016):86–105
Zhang H, Li Y, Wang P, Liu Y, Shen C (2018) RGB-D based action recognition with light-weight 3D convolutional networks. arxiv:1811.09908
Ziaeetabar F, Kulvicius T, Tamosiunaite M, Worgotter F (2018) Recognition and prediction of manipulation actions using enriched semantic event chains. Robot Auton Syst 110:173–188
Acknowledgements
The authors wish to acknowledge the financial support provided by the Science and Engineering Research Board (SERB), the Government of India, through the project grant numbered ECR/2016/00652. The authors wish to acknowledge the NVIDIA GPU grant team for providing graphics card to perform the necessary experiments for this work.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mukherjee, S., Anvitha, L. & Lahari, T.M. Human activity recognition in RGB-D videos by dynamic images. Multimed Tools Appl 79, 19787–19801 (2020). https://doi.org/10.1007/s11042-020-08747-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08747-3