Abstract
Human action recognition has been an attractive research topic in recent years due to its wide range of applications. Among existing methods, the Graph Convolutional Network achieves remarkable results by exploring the graph nature of skeleton data in both spatial and temporal domains. Noise from the pose estimation error is an inherent issue that could seriously degrade action recognition performance. Existing graph-based methods mainly focus on improving recognition accuracy, whereas low-complexity models are required for application development on devices with limited computation capacity. In this paper, a lightweight model is proposed by pruning layers, adding Feature Fusion and Preset Joint Subset Selection modules. The proposed model takes advantages of the recent Graph-based convolution networks (GCN) and selecting informative joints. Two graph topologies are defined for the selected joints. Extensive experiments are implemented on public datasets to evaluate the performance of the proposed method. Experimental results show that the method outperforms the baselines on the datasets with serious noise in skeleton data. In contrast, the number of parameters in the proposed method is 5.6 times less than the baseline. The proposed lightweight models therefore offer feasible solutions for developing practical applications.
Similar content being viewed by others
References
Cho K, Van Merriënboer B, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078
Du Y, Wang W, Wang L (2015) Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1110–1118
Ghorbel E, Boutteau R, Boonaert J, Savatier X, Lecoeuche S (2015) 3D real-time human action recognition using a spline interpolation approach. In: 2015 International conference on image processing theory, tools and applications (IPTA). IEEE, pp 61–66
Heidari N, Iosifidis A (2021) Progressive spatio-temporal graph convolutional network for skeleton-based human action recognition. In: ICASSP 2021-2021 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3220–3224
Hoang VN, Le TL, Tran TH, Nguyen VT, et al. (2019) 3D skeleton-based action recognition with convolutional neural networks. In: 2019 International conference on multimedia analysis and pattern recognition (MAPR). IEEE, pp 1–6
Hussein ME, Torki M, Gowayyed MA, El-Saban M (2013) Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations. In: The proceeding of twenty-third international joint conference on artificial intelligence
Johansson G (1973) Visual perception of biological motion and a model for its analysis. Perception & psychophysics 14(2):201–211
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3D action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3288–3297
Kim TS, Reiter A (2017) Interpretable 3D human action analysis with temporal convolutional networks. In: Conference on computer vision and pattern recognition workshops (CVPRW). IEEE, pp 1623–1631
Lea C, Flynn MD, Vidal R, Reiter A, Hager GD (2017) Temporal convolutional networks for action segmentation and detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 156–165
Li B, Dai Y, Cheng X, Chen H, Lin Y, He M (2017) Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep CNN. In: International conference on multimedia & expo workshops (ICMEW). IEEE, pp 601–604
Li C, Wang P, Wang S, Hou Y, Li W (2017) Skeleton-based action recognition using lstm and cnn. In: International conference on multimedia & expo workshops (ICMEW). IEEE, pp 585–590
Li C, Zhong Q, Xie D, Pu S (2018) Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation. arXiv:1804.06055
Li L, Zheng W, Zhang Z, Huang Y, Wang L (2018) Skeleton-based relational modeling for action recognition. arXiv:1805.02556 1 (2):3
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3595–3603
Li S, Li W, Cook C, Gao Y (2019) Deep independently recurrent neural network (IndRNN). arXiv:1910.06251
Li S, Li W, Cook C, Zhu C, Gao Y (2018) Independently recurrent neural network (IndRNN): building a longer and deeper RNN. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5457–5466
Li W, Zhang Z, Liu Z (2010) Action recognition based on a bag of 3D points. In: Computer society conference on computer vision and pattern recognition-workshops. IEEE, pp 9–14
Liu J, Shahroudy A, Xu D, Wang G (2016) Spatio-temporal LSTM with trust gates for 3D human action recognition. In: European conference on computer vision. Springer, pp 816–833
Liu M, Liu H, Chen C (2017) Enhanced skeleton visualization for view invariant human action recognition. Pattern Recogn 68:346–362
Matplotlib: Choosing colormaps in matplotlib (2021) https://matplotlib.org/stable/tutorials/colors/colormaps.html. Accessed 28 Nov 2021
Nguyen TN, Pham DT, Le TL, Vu H, Tran TH (2018) Novel skeleton-based action recognition using covariance descriptors on most informative joints. In: 2018 10Th international conference on knowledge and systems engineering (KSE). IEEE, pp 50–55
Nguyen VT, Nguyen TN, Le TL, Pham DT, Vu H (2021) Adaptive most joint selection and covariance descriptions for a robust skeleton-based human action recognition. Multimed Tools Appl
Ofli F, Chaudhry R, Kurillo G, Vidal R, Bajcsy R (2014) Sequence of the most informative joints (SMIJ): A new representation for human skeletal action recognition. J Vis Commun Image Represent 25(1):24–38
Pham DT, Dang TP, Nguyen DQ, Le TL, Vu H Skeleton-based action recognition using feature fusion for spatial temporal graph convolutional networks. J Sci Technol, pp 1–19
Pham DT, Nguyen TN, Le TL, Vu H (2019) Analyzing role of joint subset selection in human action recognition. In: 2019 6Th NAFOSTED conference on information and computer science (NICS). IEEE, pp 61–66
Pham DT, Pham QT, Le TL, Vu H (2021) An efficient feature fusion of graph convolutional networks and its application for real-time traffic control gestures recognition. IEEE Access
Ren B, Liu M, Ding R, Liu H (2020) A survey on 3d skeleton-based action recognition using learning method. arXiv:2002.05907
Shahroudy A, Liu J, Ng TT, Wang G (2016) NTU RGB+D: A large scale dataset for 3D human activity analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1010–1019
Shi F, Lee C, Qiu L, Zhao Y, Shen T, Muralidhar S, Han T, Zhu SC, Narayanan V (2021) Star: sparse transformer-based action recognition. arXiv:2107.07089
Shi L, Zhang Y, Cheng J, Lu H (2019) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12026–12035
Shi L, Zhang Y, Cheng J, Lu H (2020) Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans Image Process 29:9532–9545
Si C, Jing Y, Wang W, Wang L, Tan T (2018) Skeleton-based action recognition with spatial reasoning and temporal stack learning. In: Proceedings of the european conference on computer vision (ECCV), pp 103–118
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv:1406.2199
Song S, Lan C, Xing J, Zeng W, Liu J (2016) An end-to-end spatio-temporal attention model for human action recognition from skeleton data. arXiv:1611.06067
Song YF, Zhang Z, Shan C, Wang L (2020) Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans Circuits Syst Video Technol 31(5):1915–1925
Song YF, Zhang Z, Shan C, Wang L (2020) Stronger, faster and more explainable: A graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM international conference on multimedia, pp 1625–1633
Song YF, Zhang Z, Wang L (2019) Richly activated graph convolutional network for action recognition with incomplete skeletons. In: International conference on image processing (ICIP). IEEE, pp 1–5
Tang Y, Tian Y, Lu J, Li P, Zhou J (2018) Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5323–5332
Thi-Lan Le Cao-Cuong Than HQN, Pham VC (2020) Adaptive graph convolutional network with richly activated for skeleton-based human activity recognition. In: International conference on communications and electronics (ICCE), pp 1–6
Tran TH, Le TL, Pham DT, Hoang VN, Khong VM, Tran QT, Nguyen TS, Pham C (2018) A multi-modal multi-view dataset for human fall analysis and preliminary investigation on modality. In: 2018 24Th international conference on pattern recognition (ICPR). IEEE, pp 1947–1952
Vemulapalli R, Arrate F, Chellappa R (2014) Human action recognition by representing 3D skeletons as points in a lie group. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 588–595
Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3D joints. In: Computer society conference on computer vision and pattern recognition workshops. IEEE, pp 20–27
Xiao R, Hou Y, Guo Z, Li C, Wang P, Li W (2019) Self-attention guided deep features for action recognition. In: International conference on multimedia and expo (ICME). IEEE, pp 1060–1065
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. arXiv:1801.07455
Yang Z, Li Y, Yang J, Luo J (2018) Action recognition with spatio–temporal visual attention on skeleton image sequences. IEEE Trans Circuits Syst Video Technol 29(8):2405–2415
Zhang H, Hou Y, Wang P, Guo Z, Li W (2020) Sar-nas: skeleton-based action recognition via neural architecture searching. J Vis Commun Image Represent 73:102942
Zhang P, Lan C, Xing J, Zeng W, Xue J, Zheng N (2017) View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE international conference on computer vision, pp 2117–2126
Zou K, Yin M, Huang W, Zeng Y (2019) Deep stacked bidirectional lstm neural network for skeleton-based action recognition. In: International conference on image and graphics. Springer, pp 676–688
Acknowledgements
This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2017.315.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interests
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pham, DT., Pham, QT., Nguyen, TT. et al. A lightweight graph convolutional network for skeleton-based action recognition. Multimed Tools Appl 82, 3055–3079 (2023). https://doi.org/10.1007/s11042-022-13298-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13298-w