Abstract
Human action recognition (HAR) is a hotspot in the field of computer vision, the models based on Graph Convolutional Network (GCN) show great advantages in skeleton-based HAR. However,most existing GCN based methods do not consider the diversity of action trajectories, and not highlight the key joints. To address these issues, a supervised spatio-temporal contrastive learning framework with optimal skeleton subgraph topology for HAR (SSTCL-optSST) is proposed. SSTCL-optSST uses the samples with the same lablel as the target action (anchor) to build a positive sample set, each of them represents a trajectory of an action. The sample set is used to design a loss function to guide the model recognize different poses of the action. Furthermore, the subgraphs of an original skeleton graph are used to construct a skeleton subgraph topology space, each subgraph in it is evaluated, and the optimal one is selected to highlight the key joints. Extensive experiments have been conducted on NTU RGB+D 60 and Kinetics datasets, the results show that our model has competitive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Gajjar, V., Gurnani, A., Khandhediya, Y.: Human detection and tracking for video surveillance: a cognitive science approach. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 2805–2809 (2017)
Sahaï, A., Desantis, A., Grynszpan, O., Pacherie, E., Berberian, B.: Action co-representation and the sense of agency during a joint simon task: comparing human and machine co-agents. Conscious. Cogn. 67, 44–55 (2019)
Pilarski, P.M., Butcher, A., Johanson, M., Botvinick, M.M., Bolt, A., Parker, A.S.: Learned human-agent decision-making, communication and joint action in a virtual reality environment. arXiv preprint arXiv:1905.02691 (2019)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50
Kim, T.S., Reiter, A.: Interpretable 3D human action analysis with temporal convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 20–28 (2017)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Duan, H., Zhao, Y., Chen, K., Lin,D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2969–2978 (2022)
Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels. arXiv preprint arXiv:1406.2080 (2014)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2117–2126 (2017)
Ji, X., Zhao, Q., Cheng, J., Ma, C.: Exploiting spatio-temporal representation for 3D human action recognition from depth map sequences. Knowl.-Based Syst. 227, 107040 (2021)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Zhang, P., Lan, C., Zeng, W., Xing, J., Xue, J., Zheng, N.: Semantics-guided neural networks for efficient skeleton-based human action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1112–1121 (2020)
Peng, W., Hong, X., Zhao, G.: Tripool: graph triplet pooling for 3d skeleton-based action recognition. Pattern Recogn. 115, 107921 (2021)
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304. JMLR Workshop and Conference Proceedings (2010)
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)
Khosla, P.: Supervised contrastive learning. Adv. Neural. Inf. Process. Syst. 33, 18661–18673 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)
Hinton, G., Vinyals, O., Dean, J., et al.: Distilling the knowledge in a neural network, vol. 2, no. 7. arXiv preprint arXiv:1503.02531 (2015)
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+ d: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010–1019 (2016)
Kay, W., et al. The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Rao, H., Shihao, X., Xiping, H., Cheng, J., Bin, H.: Augmented skeleton based contrastive action learning with momentum LSTM for unsupervised action recognition. Inf. Sci. 569, 90–109 (2021)
Dai, C., Wei, Y., Xu, Z., Chen, M., Liu, Y., Fan, J.: ConMLP: MLP-based self-supervised contrastive learning for skeleton data analysis and action recognition. Sensors 23(5), 2452 (2023)
Li, C., Xie, C., Zhang, B., Han, J., Zhen, X., Chen, J.: Memory attention networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 33(9), 4800–4814 (2021)
Li, S., Yi, J., Farha, Y.A., Gall, J.: Pose refinement graph convolutional network for skeleton-based action recognition. IEEE Rob. Autom. Lett. 6(2), 1028–1035 (2021)
Ding, W., Li, X., Li, G., Wei, Y.: Global relational reasoning with spatial temporal graph interaction networks for skeleton-based action recognition. Signal Process. Image Commun. 83, 115776 (2020)
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3595–3603 (2019)
Gao, X., Hu, W., Tang, J., Liu,J., Guo, Z.: Optimized skeleton-based action recognition via sparsified graph regression. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 601–610 (2019)
Liu, Y., Zhang, H., Dan, X., He, K.: Graph transformer network with temporal kernel attention for skeleton-based action recognition. Knowl.-Based Syst. 240, 108146 (2022)
Li, B., Li, X., Zhang, Z., Fei, W.: Spatio-temporal graph routing for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568 (2019)
Yoon, Y., Jongmin, Yu., Jeon, M.: Predictively encoded graph convolutional network for noise-robust skeleton-based action recognition. Appl. Intell. 52(3), 2317–2331 (2022)
Acknowledgement
This work was supported in part by the National Natural Science Foundation of China under Grant No. 61977018; Natural Science Foundation of Changsha under Grant No. kq2202215; Practical Innovation and Entrepreneurship Enhancement Program for Professional Degree Postgraduates of Changsha University of Science and Technology (CLSJCX22114).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Deng, Z. et al. (2024). A Supervised Spatio-Temporal Contrastive Learning Framework with Optimal Skeleton Subgraph Topology for Human Action Recognition. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1964. Springer, Singapore. https://doi.org/10.1007/978-981-99-8141-0_13
Download citation
DOI: https://doi.org/10.1007/978-981-99-8141-0_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8140-3
Online ISBN: 978-981-99-8141-0
eBook Packages: Computer ScienceComputer Science (R0)