skip to main content
10.1145/3484274.3484288acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicccvConference Proceedingsconference-collections
research-article

Temporal-Aware Graph Convolution Network for Skeleton-based Action Recognition

Authors Info & Claims
Published:23 November 2021Publication History

ABSTRACT

Graph convolutions networks (GCN) have drawn attention for skeleton-based action recognition because a skeleton with joints and bones can be naturally regarded as a graph structure. However, the existing methods are limited in temporal sequence modeling of human actions. To consider temporal factors in action modeling, we present a novel Temporal-Aware Graph Convolution Network (TA-GCN). First, we design a causal temporal convolution (CTCN) layer to ensure no impractical future information leakage to the past. Second, we present a novel cross-spatial-temporal graph convolution (3D-GCN) layer that extends an adaptive graph from the spatial to the temporal domain to capture local cross-spatial-temporal dependencies among joints. Involving the two temporal factors, TA-GCN can model the sequential nature of human actions. Experimental results on two large-scale datasets, NTU-RGB+D and Kinetics-Skeleton, indicate that our network achieves accuracy improvement (about 1% on the two datasets) over previous methods.

References

  1. S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 7444–7452, Jan. 2018Google ScholarGoogle ScholarCross RefCross Ref
  2. L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the ieee computer society conference on computer vision and pattern recognition, Jun. 2019, vols. 2019-June, pp. 12018–12027Google ScholarGoogle ScholarCross RefCross Ref
  3. C. Li, Z. Cui, W. Zheng, C. Xu, and J. Yang, “Spatio-temporal graph convolution for skeleton based action recognition,” 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 3482–3489, 2018Google ScholarGoogle ScholarCross RefCross Ref
  4. M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Actional-structural graph convolutional networks for skeleton-based action recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vols. 2019-June, pp. 3590–3598, Apr. 2019Google ScholarGoogle ScholarCross RefCross Ref
  5. B. Li, X. Li, Z. Zhang, and F. Wu, “Spatio-Temporal Graph Routing for Skeleton-Based Action Recognition,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8561–8568, 2019,Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. K. Thakkar and P. J. Narayanan, “Part-based graph convolutional network for action recognition,” British Machine Vision Conference 2018, BMVC 2018, 2019,Google ScholarGoogle Scholar
  7. X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local Neural Networks,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7794–7803, Nov. 2018Google ScholarGoogle ScholarCross RefCross Ref
  8. A. Shahroudy, J. Liu, T. T. Ng, and G. Wang, “NTU RGB+D: A large scale dataset for 3D human activity analysis,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vols. 2016-Decem, pp. 1010–1019, 2016,Google ScholarGoogle ScholarCross RefCross Ref
  9. W. Kay , “The Kinetics Human Action Video Dataset,” 2017Google ScholarGoogle Scholar
  10. Z. Cao, T. Simon, S. E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vols. 2017-Janua, pp. 1302–1310, 2017Google ScholarGoogle ScholarCross RefCross Ref
  11. L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Skeleton-Based Action Recognition with Multi-Stream Adaptive Graph Convolutional Networks,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 29, no. 11, pp. 3247–3257, 2019.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Paszke , “Automatic differentiation in PyTorch,” Advances in Neural Information Processing Systems 32, pp. 8024–8035, 2019Google ScholarGoogle Scholar
  13. Y. Tang, Y. Tian, J. Lu, P. Li, and J. Zhou, “Deep Progressive Reinforcement Learning for Skeleton-Based Action Recognition,” Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 5323–5332, 2018,Google ScholarGoogle ScholarCross RefCross Ref
  14. T. S. Kim and A. Reiter, “Interpretable 3D Human Action Analysis with Temporal Convolutional Networks,” IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, vols. 2017-July, pp. 1623–1631, 2017Google ScholarGoogle ScholarCross RefCross Ref
  15. Plizzari, C., Cannici, M., & Matteucci, M. (2021). Spatial Temporal Transformer Network for Skeleton-Based Action Recognition. Lecture Notes in Computer Science, 12663 LNCS, 694–701. https://doi.org/10.1007/978-3-030-68796-0_50Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Alzubi, J., Nayyar, A., & Kumar, A. (2018, November). Machine learning from theory to algorithms: an overview. In Journal of physics: conference series (Vol. 1142, No. 1, p. 012012). IOP Publishing.Google ScholarGoogle Scholar
  17. Kukkar, A., Mohana, R., Nayyar, A., Kim, J., Kang, B. G., & Chilamkurti, N. (2019). A novel deep-learning-based bug severity classification technique using convolutional neural networks and random forest with boosting. Sensors,19(13), 2964.Google ScholarGoogle Scholar

Index Terms

  1. Temporal-Aware Graph Convolution Network for Skeleton-based Action Recognition
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Other conferences
          ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision
          August 2021
          207 pages
          ISBN:9781450390477
          DOI:10.1145/3484274

          Copyright © 2021 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 23 November 2021

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed limited
        • Article Metrics

          • Downloads (Last 12 months)11
          • Downloads (Last 6 weeks)0

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format