ADAL-GCN: Action Description Aided Learning Graph Convolution Network for Early Action Prediction

Li, Xianshan; Dong, Yuan; Ning, Xingxing; Zhang, Pengwei; Zhao, Fengda

doi:10.1007/978-981-97-8795-1_1

Xianshan Li^15,17,
Yuan Dong¹⁵,
Xingxing Ning¹⁵,
Pengwei Zhang^15,16 &
…
Fengda Zhao^15,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15041))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

149 Accesses

Abstract

Early human action prediction aims to complete the prediction of complete action sequences based solely on initial action sequences acquired at an initial stage. Considering that the execution of a single action usually relies on the synergistic coordination of multiple key body parts and the movement amplitude of different body parts at the onset of an action varies minimally, early human action prediction demonstrates high sensitivity to the location of action initiation and the type of action. Currently, skeletal-based action prediction methods primarily focus on action classification and exhibit limited capability for discrimination in terms of semantic association between actions. For instance, distinguishing actions concentrated on elbow joint movements, such as “touching the neck” and “touching the head,” proves challenging through classification alone but can be achieved through semantic relationships. Therefore, when differentiating similar actions, incorporating descriptions of specific joint movements can enhance the feature extraction ability of the model. This paper introduces an Action Description-Assisted Learning Graph Convolutional Network (ADAL-GCN), which utilizes large language models as knowledge engines to pre-generate descriptions for key parts of different actions. These descriptions are then transformed into semantically rich feature vectors through text encoding. Furthermore, the model adopts a lightweight design, decoupling features across channel and temporal dimensions, consolidating redundant network modules, and executing strategic computational migration to optimize processing efficiency. Experimental results demonstrate significant performance improvements achieved by our proposed method, which achieves substantial reductions in training time without additional computational overhead.

This work was supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, China Grant No. 2022D01A59, National Natural Science Foundation of China under Grand No. U20A20167, Key Research Foundation of Integration of Industry and Education and the Development of New Business Studies Research Center, Xinjiang University of Science and Technology under Grand No. 2022-KYZD02, Innovation Capability Improvement Plan Project of Hebei Province under Grand No. 22567637H. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 74.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DDGCN: A Dynamic Directed Graph Convolutional Network for Action Recognition

Skeleton-based multi-stream adaptive-attentional sub-graph convolution network for action recognition

Article 12 May 2023

Spatial Temporal Attention Graph Convolutional Networks with Mechanics-Stream for Skeleton-Based Action Recognition

References

Hosseinzadeh, M., Sinopoli, B., Bobick, A.F.: Toward safe and efficient human-robot interaction via behavior-driven danger signaling. IEEE Trans. Control Syst. Technol. 32(1), 214–224 (2024)
Article Google Scholar
Trirat, P., Yoon, S., Lee, J.-G.: Mg-tar: Multi-view graph convolutional networks for traffic accident risk prediction. IEEE Trans. Intell. Transp. Syst. 24, 3779–3794 (2023)
Article Google Scholar
Li, J., Hu, H., Xing, Q., Wang, X., Li, J., Shen, Y.: Tai Chi action quality assessment and visual analysis with a consumer rgb-d camera. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2022)
Google Scholar
Sarwar, M.A., Lin, Y.C., Daraghmi, Y.A., IK, T.U, Li, Y.L.: Badminton smash case: skeleton based keyframe detection framework for sports action analysis. IEEE Access 11, 90891–90900 (2023)
Google Scholar
Ma, H., Yang, Z., Liu, H.: Fine-grained unsupervised temporal action segmentation and distributed representation for skeleton-based human motion analysis. IEEE Trans. Cybern. 52(12), 13411–13424 (2022)
Article Google Scholar
Baselizadeh, A., Khaksar, W., Uddin, M.Z., Saplacan, D., Torresen, J.: Privacy-preserving user pose prediction for safe and efficient human-robot interaction. In: 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), pp. 1–8 (2023)
Google Scholar
Li, X., Li, C., Wei, X., Yang, F.: Manifold guided graph neural networks for skeleton-based action recognition in human computer interaction videos. In: 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), pp. 239–244 (2021)
Google Scholar
Jun, X., Wang, H., Zhang, J., Cai, L.: Robust hand gesture recognition based on rgb-d data for natural human-computer interaction. IEEE Access 10, 54549–54562 (2022)
Article Google Scholar
Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners (2020) ArXiv, abs/2005.14165
Google Scholar
Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13339–13348 (2021)
Google Scholar
Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). ArXiv, abs/1609.02907
Google Scholar
Ye, F., Pu, S., Zhong, Q., Li, C., Xie, D., Tang, H.: Dynamic gcn: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, pp. 55–63, New York, NY, USA. Association for Computing Machinery (2020)
Google Scholar
Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimed. 1–13 (2023)
Google Scholar
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 140–149 (2020)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Chi, H.G., Ha, M.H., Chi, S., Lee, S.W., Huang, Q., Ramani, K.: Infogcn: representation learning for human skeleton-based action recognition. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20154–20164 (2022)
Google Scholar
Wang, Q., Shi, S., He, J., Peng, J., Liu, T., Weng, R.: Iip-transformer: intra-inter-part transformer for skeleton-based action recognition. In: 2023 IEEE International Conference on Big Data (BigData), pp. 936–945 (2023)
Google Scholar
Bouzid, H., Ballihi, L.: Spatr: Mocap 3d human action recognition based on spiral auto-encoder and transformer network. Comput. Vis. Image Underst. 241, 103974 (2024)
Article Google Scholar
Huang, L., Huang, Y., Ouyang, W., Wang, L.: Part-level graph convolutional network for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence (2020)
Google Scholar
Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (2018)
Google Scholar
Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, pp. 1625–1633, New York, NY, USA. Association for Computing Machinery (2020)
Google Scholar
Guan, W., Song, X., Wang, K., Wen, H., Ni, H., Wang, Y., Chang, X.: Egocentric early action prediction via multimodal transformer-based dual action prediction. IEEE Trans. Circuits Syst. Video Technol. 33(9), 4472–4483 (2023)
Article Google Scholar
Wang, K., Deng, H., Zhu, Q.: Lightweight channel-topology based adaptive graph convolutional network for skeleton-based action recognition. Neurocomputing 560, 126830 (2023)
Article Google Scholar
Jiang, Y., Deng, H.: Lighter and faster: a multi-scale adaptive graph convolutional network for skeleton-based action recognition. Eng. Appl. Artif. Intell. 132, 107957 (2024)
Article Google Scholar
Zheng, Q., Guo, H., Yin, Y., Zheng, B., Jiang, H.: Lfsimcc: spatial fusion lightweight network for human pose estimation. J. Vis. Commun. Image Represent. 99, 104093 (2024)
Article Google Scholar
Zhao, Y., Gao, Q., Zhaojie, J., Zhou, J., Guo, Y.: Sharing-net: lightweight feedforward network for skeleton-based action recognition based on information sharing mechanism. Pattern Recogn. 146, 110050 (2024)
Article Google Scholar
Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Neural Information Processing Systems (2017)
Google Scholar
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)
Google Scholar
Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2016)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)
Google Scholar
Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging lstms to anticipate actions very early. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 280–289 (2017)
Google Scholar
Wang, X., Hu, J.F., Lai, J.H., Zhang, J., Zheng, W.S.: Progressive teacher-student learning for early action prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3551–3560 (2019)
Google Scholar
Pang, G., Wang, X., Hu, J.F., Zhang, Q., Zheng, W.S.: Dbdnet: learning bi-directional dynamics for early action prediction. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, pp. 897–903. AAAI Press (2019)
Google Scholar
Weng, J., Jiang, X., Zheng, W.-L., Yuan, J.: Early action recognition with category exclusion using policy-based reinforcement learning. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4626–4638 (2020)
Article Google Scholar
Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)
Article MathSciNet Google Scholar
Li, T., Liu, J., Zhang, W., Duan, L.Y.: Hard-net: hardness-aware discrimination network for 3d early activity prediction. In: European Conference on Computer Vision (2020)
Google Scholar
Wang, W., Chang, F., Liu, C., Li, G., Wang, B.: Ga-net: a guidance aware network for skeleton-based early activity recognition. IEEE Trans. Multimed. 25, 1061–1073 (2023)
Article Google Scholar
Song, Y.-F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1474–1488 (2023)
Article Google Scholar
Liu, C., Zhao, X., Yan, Z., Jiang, Y., Shi, X.: A graph convolutional network with early attention module for skeleton-based action prediction. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1266–1272 (2022)
Google Scholar
Liu, C., Zhao, X., Li, Z., Yan, Z., Chong, D.: A novel two-stage knowledge distillation framework for skeleton-based action prediction. IEEE Signal Process. Lett. 29, 1918–1922 (2022)
Article Google Scholar
Li, G., Li, N., Chang, F., Liu, C.: Adaptive graph convolutional network with adversarial learning for skeleton-based action prediction. IEEE Trans. Cogn. Dev. Syst. 14(3), 1258–1269 (2022)
Article Google Scholar
Wang, R., Liu, J., Ke, Q., Peng, D., Lei, Y.: Dear-net: learning diversities for skeleton-based early action recognition. IEEE Trans. Multimed. 25, 1175–1189 (2023)
Article Google Scholar
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12018–12027 (2019)
Google Scholar
Foo, L.G., Li, T., Rahmani, H., Ke, Q., Liu, J.: Era: expert retrieval and  assembly for   early action prediction. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, Proceedings, Part XXXIV, pp. 670–688. Springer, Berlin, Heidelberg (2022)
Google Scholar
Wang, S., Zhang, Y., Wei, F., Wang, K., Zhao, M., Jiang, Y.: Skeleton-based action recognition via temporal-channel aggregation (2022). ArXiv, abs/2205.15936
Google Scholar

Download references

Author information

Authors and Affiliations

Yanshan University, Qinhuangdao, Hebei, China
Xianshan Li, Yuan Dong, Xingxing Ning, Pengwei Zhang & Fengda Zhao
School of Information Science and Engineering, Xinjiang University of Science and Technology, Xinjiang, China
Pengwei Zhang & Fengda Zhao
Key Laboratory for Software Engineering of Hebei Province, Yanshan University, Qinhuangdao, China
Xianshan Li & Fengda Zhao

Authors

Xianshan Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuan Dong
View author publications
You can also search for this author in PubMed Google Scholar
Xingxing Ning
View author publications
You can also search for this author in PubMed Google Scholar
Pengwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fengda Zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengda Zhao .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhouchen Lin
Nankai University, Tianjin, China
Ming-Ming Cheng
Chinese Academy of Sciences, Beijing, China
Ran He
Xinjiang University, Ürümqi, Xinjiang, China
Kurban Ubul
Xinjiang University, Ürümqi, China
Wushouer Silamu
Peking University, Beijing, China
Hongbin Zha
Tsinghua University, Beijing, China
Jie Zhou
Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, X., Dong, Y., Ning, X., Zhang, P., Zhao, F. (2025). ADAL-GCN: Action Description Aided Learning Graph Convolution Network for Early Action Prediction. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15041. Springer, Singapore. https://doi.org/10.1007/978-981-97-8795-1_1

Download citation

DOI: https://doi.org/10.1007/978-981-97-8795-1_1
Published: 03 November 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-8794-4
Online ISBN: 978-981-97-8795-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics