Skip to main content

ADAL-GCN: Action Description Aided Learning Graph Convolution Network for Early Action Prediction

  • Conference paper
  • First Online:
Pattern Recognition and Computer Vision (PRCV 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15041))

Included in the following conference series:

  • 149 Accesses

Abstract

Early human action prediction aims to complete the prediction of complete action sequences based solely on initial action sequences acquired at an initial stage. Considering that the execution of a single action usually relies on the synergistic coordination of multiple key body parts and the movement amplitude of different body parts at the onset of an action varies minimally, early human action prediction demonstrates high sensitivity to the location of action initiation and the type of action. Currently, skeletal-based action prediction methods primarily focus on action classification and exhibit limited capability for discrimination in terms of semantic association between actions. For instance, distinguishing actions concentrated on elbow joint movements, such as “touching the neck” and “touching the head,” proves challenging through classification alone but can be achieved through semantic relationships. Therefore, when differentiating similar actions, incorporating descriptions of specific joint movements can enhance the feature extraction ability of the model. This paper introduces an Action Description-Assisted Learning Graph Convolutional Network (ADAL-GCN), which utilizes large language models as knowledge engines to pre-generate descriptions for key parts of different actions. These descriptions are then transformed into semantically rich feature vectors through text encoding. Furthermore, the model adopts a lightweight design, decoupling features across channel and temporal dimensions, consolidating redundant network modules, and executing strategic computational migration to optimize processing efficiency. Experimental results demonstrate significant performance improvements achieved by our proposed method, which achieves substantial reductions in training time without additional computational overhead.

This work was supported in part by the Natural Science Foundation of Xinjiang Uygur Autonomous Region, China Grant No. 2022D01A59, National Natural Science Foundation of China under Grand No. U20A20167, Key Research Foundation of Integration of Industry and Education and the Development of New Business Studies Research Center, Xinjiang University of Science and Technology under Grand No. 2022-KYZD02, Innovation Capability Improvement Plan Project of Hebei Province under Grand No. 22567637H. The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Hosseinzadeh, M., Sinopoli, B., Bobick, A.F.: Toward safe and efficient human-robot interaction via behavior-driven danger signaling. IEEE Trans. Control Syst. Technol. 32(1), 214–224 (2024)

    Article  Google Scholar 

  2. Trirat, P., Yoon, S., Lee, J.-G.: Mg-tar: Multi-view graph convolutional networks for traffic accident risk prediction. IEEE Trans. Intell. Transp. Syst. 24, 3779–3794 (2023)

    Article  Google Scholar 

  3. Li, J., Hu, H., Xing, Q., Wang, X., Li, J., Shen, Y.: Tai Chi action quality assessment and visual analysis with a consumer rgb-d camera. In: 2022 IEEE 24th International Workshop on Multimedia Signal Processing (MMSP), pp. 1–6 (2022)

    Google Scholar 

  4. Sarwar, M.A., Lin, Y.C., Daraghmi, Y.A., IK, T.U, Li, Y.L.: Badminton smash case: skeleton based keyframe detection framework for sports action analysis. IEEE Access 11, 90891–90900 (2023)

    Google Scholar 

  5. Ma, H., Yang, Z., Liu, H.: Fine-grained unsupervised temporal action segmentation and distributed representation for skeleton-based human motion analysis. IEEE Trans. Cybern. 52(12), 13411–13424 (2022)

    Article  Google Scholar 

  6. Baselizadeh, A., Khaksar, W., Uddin, M.Z., Saplacan, D., Torresen, J.: Privacy-preserving user pose prediction for safe and efficient human-robot interaction. In: 2023 IEEE 19th International Conference on Automation Science and Engineering (CASE), pp. 1–8 (2023)

    Google Scholar 

  7. Li, X., Li, C., Wei, X., Yang, F.: Manifold guided graph neural networks for skeleton-based action recognition in human computer interaction videos. In: 2021 International Conference on Signal Processing and Machine Learning (CONF-SPML), pp. 239–244 (2021)

    Google Scholar 

  8. Jun, X., Wang, H., Zhang, J., Cai, L.: Robust hand gesture recognition based on rgb-d data for natural human-computer interaction. IEEE Access 10, 54549–54562 (2022)

    Article  Google Scholar 

  9. Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., Amodei, D.: Language models are few-shot learners (2020) ArXiv, abs/2005.14165

    Google Scholar 

  10. Chen, Y., Zhang, Z., Yuan, C., Li, B., Deng, Y., Hu, W.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13339–13348 (2021)

    Google Scholar 

  11. Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks (2016). ArXiv, abs/1609.02907

    Google Scholar 

  12. Ye, F., Pu, S., Zhong, Q., Li, C., Xie, D., Tang, H.: Dynamic gcn: context-enriched topology learning for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, pp. 55–63, New York, NY, USA. Association for Computing Machinery (2020)

    Google Scholar 

  13. Liu, J., Wang, X., Wang, C., Gao, Y., Liu, M.: Temporal decoupling graph convolutional network for skeleton-based gesture recognition. IEEE Trans. Multimed. 1–13 (2023)

    Google Scholar 

  14. Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 140–149 (2020)

    Google Scholar 

  15. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  16. Chi, H.G., Ha, M.H., Chi, S., Lee, S.W., Huang, Q., Ramani, K.: Infogcn: representation learning for human skeleton-based action recognition. In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20154–20164 (2022)

    Google Scholar 

  17. Wang, Q., Shi, S., He, J., Peng, J., Liu, T., Weng, R.: Iip-transformer: intra-inter-part transformer for skeleton-based action recognition. In: 2023 IEEE International Conference on Big Data (BigData), pp. 936–945 (2023)

    Google Scholar 

  18. Bouzid, H., Ballihi, L.: Spatr: Mocap 3d human action recognition based on spiral auto-encoder and transformer network. Comput. Vis. Image Underst. 241, 103974 (2024)

    Article  Google Scholar 

  19. Huang, L., Huang, Y., Ouyang, W., Wang, L.: Part-level graph convolutional network for skeleton-based action recognition. In: AAAI Conference on Artificial Intelligence (2020)

    Google Scholar 

  20. Thakkar, K.C., Narayanan, P.J.: Part-based graph convolutional network for action recognition. In: British Machine Vision Conference (2018)

    Google Scholar 

  21. Song, Y.F., Zhang, Z., Shan, C., Wang, L.: Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition. In: Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, pp. 1625–1633, New York, NY, USA. Association for Computing Machinery (2020)

    Google Scholar 

  22. Guan, W., Song, X., Wang, K., Wen, H., Ni, H., Wang, Y., Chang, X.: Egocentric early action prediction via multimodal transformer-based dual action prediction. IEEE Trans. Circuits Syst. Video Technol. 33(9), 4472–4483 (2023)

    Article  Google Scholar 

  23. Wang, K., Deng, H., Zhu, Q.: Lightweight channel-topology based adaptive graph convolutional network for skeleton-based action recognition. Neurocomputing 560, 126830 (2023)

    Article  Google Scholar 

  24. Jiang, Y., Deng, H.: Lighter and faster: a multi-scale adaptive graph convolutional network for skeleton-based action recognition. Eng. Appl. Artif. Intell. 132, 107957 (2024)

    Article  Google Scholar 

  25. Zheng, Q., Guo, H., Yin, Y., Zheng, B., Jiang, H.: Lfsimcc: spatial fusion lightweight network for human pose estimation. J. Vis. Commun. Image Represent. 99, 104093 (2024)

    Article  Google Scholar 

  26. Zhao, Y., Gao, Q., Zhaojie, J., Zhou, J., Guo, Y.: Sharing-net: lightweight feedforward network for skeleton-based action recognition based on information sharing mechanism. Pattern Recogn. 146, 110050 (2024)

    Article  Google Scholar 

  27. Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: Neural Information Processing Systems (2017)

    Google Scholar 

  28. Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning (2021)

    Google Scholar 

  29. Shahroudy, A., Liu, J., Ng, T.T., Wang, G.: Ntu rgb+d: a large scale dataset for 3d human activity analysis. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1010–1019 (2016)

    Google Scholar 

  30. Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with rgb-d videos and 3d hand pose annotations. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 409–419 (2018)

    Google Scholar 

  31. Aliakbarian, M.S., Saleh, F.S., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging lstms to anticipate actions very early. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 280–289 (2017)

    Google Scholar 

  32. Wang, X., Hu, J.F., Lai, J.H., Zhang, J., Zheng, W.S.: Progressive teacher-student learning for early action prediction. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3551–3560 (2019)

    Google Scholar 

  33. Pang, G., Wang, X., Hu, J.F., Zhang, Q., Zheng, W.S.: Dbdnet: learning bi-directional dynamics for early action prediction. In: Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI’19, pp. 897–903. AAAI Press (2019)

    Google Scholar 

  34. Weng, J., Jiang, X., Zheng, W.-L., Yuan, J.: Early action recognition with category exclusion using policy-based reinforcement learning. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4626–4638 (2020)

    Article  Google Scholar 

  35. Ke, Q., Bennamoun, M., Rahmani, H., An, S., Sohel, F., Boussaid, F.: Learning latent global network for skeleton-based action prediction. IEEE Trans. Image Process. 29, 959–970 (2020)

    Article  MathSciNet  Google Scholar 

  36. Li, T., Liu, J., Zhang, W., Duan, L.Y.: Hard-net: hardness-aware discrimination network for 3d early activity prediction. In: European Conference on Computer Vision (2020)

    Google Scholar 

  37. Wang, W., Chang, F., Liu, C., Li, G., Wang, B.: Ga-net: a guidance aware network for skeleton-based early activity recognition. IEEE Trans. Multimed. 25, 1061–1073 (2023)

    Article  Google Scholar 

  38. Song, Y.-F., Zhang, Z., Shan, C., Wang, L.: Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(2), 1474–1488 (2023)

    Article  Google Scholar 

  39. Liu, C., Zhao, X., Yan, Z., Jiang, Y., Shi, X.: A graph convolutional network with early attention module for skeleton-based action prediction. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 1266–1272 (2022)

    Google Scholar 

  40. Liu, C., Zhao, X., Li, Z., Yan, Z., Chong, D.: A novel two-stage knowledge distillation framework for skeleton-based action prediction. IEEE Signal Process. Lett. 29, 1918–1922 (2022)

    Article  Google Scholar 

  41. Li, G., Li, N., Chang, F., Liu, C.: Adaptive graph convolutional network with adversarial learning for skeleton-based action prediction. IEEE Trans. Cogn. Dev. Syst. 14(3), 1258–1269 (2022)

    Article  Google Scholar 

  42. Wang, R., Liu, J., Ke, Q., Peng, D., Lei, Y.: Dear-net: learning diversities for skeleton-based early action recognition. IEEE Trans. Multimed. 25, 1175–1189 (2023)

    Article  Google Scholar 

  43. Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 12018–12027 (2019)

    Google Scholar 

  44. Foo, L.G., Li, T., Rahmani, H., Ke, Q., Liu, J.: Era: expert retrieval and  assembly for   early action prediction. In: Computer Vision—ECCV 2022: 17th European Conference, Tel Aviv, Israel, Proceedings, Part XXXIV, pp. 670–688. Springer, Berlin, Heidelberg (2022)

    Google Scholar 

  45. Wang, S., Zhang, Y., Wei, F., Wang, K., Zhao, M., Jiang, Y.: Skeleton-based action recognition via temporal-channel aggregation (2022). ArXiv, abs/2205.15936

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fengda Zhao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, X., Dong, Y., Ning, X., Zhang, P., Zhao, F. (2025). ADAL-GCN: Action Description Aided Learning Graph Convolution Network for Early Action Prediction. In: Lin, Z., et al. Pattern Recognition and Computer Vision. PRCV 2024. Lecture Notes in Computer Science, vol 15041. Springer, Singapore. https://doi.org/10.1007/978-981-97-8795-1_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-8795-1_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-8794-4

  • Online ISBN: 978-981-97-8795-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics