View transform graph attention recurrent networks for skeleton-based action recognition

Huang, Qingqing; Zhou, Fengyu; Qin, Runze; zhao, Yang

doi:10.1007/s11760-020-01781-6

View transform graph attention recurrent networks for skeleton-based action recognition

Original Paper
Published: 19 September 2020

Volume 15, pages 599–606, (2021)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Qingqing Huang¹,
Fengyu Zhou¹,
Runze Qin¹ &
…
Yang zhao²

737 Accesses
8 Citations
Explore all metrics

Abstract

Human action recognition based on skeleton recently has attracted attention of researchers due to the accessibility and popularity of the 3D skeleton data. However, it is complicated to effectively represent spatial–temporal skeleton sequences given the large variations of action representations when they are captured from different viewpoints. In order to get a better representation of the spatial–temporal skeletal features, this paper introduces a view transform graph attention recurrent networks (VT+GARN) method for view-invariant human action recognition. We design a view-invariant transform strategy based on the sequence to reduce the influence of different views on the spatial–temporal position of skeleton joint. Then, the graph attention recurrent network automatically calculates the coefficient of attention and learns the representation of spatiotemporal skeletal features after the transformation and outputs the classification result. Ablation studies and extensive experiments on three challenging datasets, Northwestern-UCLA, NTU RGB+D and UWA3DII, demonstrate the effectiveness and superiority of our method

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

MS-GTR: Multi-stream Graph Transformer for Skeleton-Based Action Recognition

Spatial Temporal Attention Graph Convolutional Networks with Mechanics-Stream for Skeleton-Based Action Recognition

References

Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: A review. Pattern Recognit. Lett. 48, 70–80 (2014)
Article Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos[C] In: Advances in Neural Information Processing Systems. pp. 568–576 (2014)
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3d convolutional networks[C] In: Proceedings of the IEEE International Conference on Computer Vision. pp. 4489–4497 (2015)
Wang, L., Xiong, Y., Wang, Z., et al.: Temporal segment networks: Towards good practices for deep action recognition[C]. In: European Conference on Computer Vision. Springer, Cham, pp. 20–36 (2016)
Wang, P., Li, W., Ogunbona, P., et al.: RGB-D-based human motion recognition with deep learning: A survey[J]. Comput. Vision Image Underst. 171, 118–139 (2018)
Article Google Scholar
Wang, J., Liu, Z., Wu, Y., et al.: Mining actionlet ensemble for action recognition with depth cameras[C]. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp. 1290–1297 (2012)
Hussein, M.E., Torki, M., Gowayyed, M.A., et al.: Human action recognition using a temporal hierarchy of covariance descriptors on 3d joint locations[C]. In: Twenty-Third International Joint Conference on Artificial Intelligence. pp. 2466–2472 (2013) https://dl.acm.org/citation.cfm?id=2540128.2540483
Vemulapalli, R., Chellapa, R.: Rolling rotations for recognizing human actions from 3d skeletal data[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 4471–4479 (2016)
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1110–1118 (2015)
Liu, J., Shahroudy, A., Xu, D., et al.: Skeleton-based action recognition using spatio-temporal lstm network with trust gates. IEEE Trans. Pattern Anal. Mach. Intell. 40(12), 3007–3021 (2017)
Article Google Scholar
Li, C., Zhong, Q., Xie, D., et al.: Skeleton-based action recognition with convolutional neural networks[C]., In: 2017 IEEE International Conference on Multimedia and Expo Workshops (ICMEW). IEEE, pp. 597–600 (2017)
Ke, Q., Bennamoun, M., An, S., et al.: A new representation of skeleton sequences for 3d action recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3288–3297 (2017)
Kim, T.S., Reiter, A.: Interpretable 3d human action analysis with temporal convolutional networks[C]. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, pp. 1623–1631 (2017)
Li, C., Zhong, Q., Xie, D., et al.: Co-occurrence feature learning from skeleton data for action recognition and detection with hierarchical aggregation[J]. arXiv:1804.06055, (2018)
Scarselli, F., Gori, M., Tsoi, A.C., et al.: The graph neural network model. IEEE Trans. Neural Netw. 20(1), 61–80 (2008)
Article Google Scholar
Qi, S., Wang, W., Jia, B., et al.: Learning human-object interactions by graph parsing neural networks[C]. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 401-417 (2018)
Bruna, J., Zaremba, W., Szlam, A., et al.: Spectral networks and locally connected networks on graphs[J]. arXiv:1312.6203 (2013)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering[C]. In: Advances in Neural Information Processing Systems. pp. 3844–3852 (2016)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks[J]. arXiv:1609.02907, (2016)
Xia, L., Chen, C.-C, Aggarwal, J.K.: View invariant human action recognition using histograms of 3D joints[C]. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops
Raptis, M., Kirovski, D., Hoppe, H.: Real-time classification of dance gestures from skeleton animation[C]. In: Proceedings of the ACM SIGGRAPH/ Eurographics Symposium on Computer Animation. pp. 147–156 (2011)
Liu, M., Liu, H., Chen, C.: Enhanced skeleton visualization for view invariant human action recognition. Pattern Recognit. 68, 346–362 (2017)
Article Google Scholar
Zhang, P., Lan, C., Xing, J., et al.: View Adaptive Neural Networks for High Performance Skeleton-based Human Action Recognition[J]. IEEE Trans. Pattern Anal. Mach. Intell. (2018). https://doi.org/10.1109/TPAMI.2019.2896631
Velikovi, P., Cucurull, G., Casanova, A., et al.: Graph attention networks[J]. arXiv:1710.10903 (2017)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition[C]. In: Thirty-second AAAI conference on artificial intelligence. arXiv:1810.07455v2 (2018)
Tang, Y., Tian, Y., Lu, J., et al.: Deep progressive reinforcement learning for skeleton-based action recognition[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. (2018) pp. 5323–5332
Si, C., Chen, W., Wang, W., et al.: An Attention Enhanced Graph Convolutional LSTM Network for Skeleton-Based Action Recognition[J]. arXiv:1902.09130 (2019)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need[C]. In: Advances in Neural Information Processing Systems. pp. 5998–6008 (2017)
Shahroudy, A., Liu, J., Ng, T.T., et al.: Ntu rgb+ d: A large scale dataset for 3d human activity analysis[C]. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 1010–1019 (2016)
Wang, J., Nie, X., Xia, Y., et al.: Cross-view action modeling, learning, and recognition[C]. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 2649–2656 (2014)
Rahmani, H., Mahmood, A., Huynh, D., et al.: Histogram of oriented principal components for cross-view action recognition[J]. IEEE Trans. Pattern Anal. Mach. Intell. 2430–2443 (2016)
Kingma D.P., Ba, J.: Adam: A method for stochastic optimization. In: ICLR, arXiv:1412.6980 (2015)
Vemulapalli, R., Arrate, F., Chellappa, R.: Human action recognition by representing 3D human skeletons as points in a Lie group[C]In: Cvpr. IEEE Computer Society, pp. 588–595 (2014)
Hu, J.-F., Zheng, W.-S., Lai, J., et al.: Jointly learning heterogeneous features for RGB-D activity recognition[C] In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 5344–5352 (2015)
Zhang, P., Lan, C., Xing, J., et al.: View adaptive recurrent neural networks for high performance human action recognition from skeleton data[C]. In: Proceedings of the IEEE International Conference on Computer Vision. pp. 2117-2126 (2017)
Liu, X., Li, Y., Xia, R.: Rotation-based spatial-temporal feature learning from skeleton sequences for action recognition, Signal, Image and Video Process, pp. 1–8. (2020)
Wang, J., Liu, Z., Wu, Y., et al.: Learning actionlet ensemble for 3D human action recognition. IEEE Trans. Pattern Anal. Mach. Intell 36(5), 914–927 (2014)
Article Google Scholar

Download references

Acknowledgements

Project is supported by The National Key R & D Program of China (Grant No. 2017YFB1302400), National Natural Science Foundation of China (Grant No. 61773242, No. 61803227 and No. 61375084), Major Agricultural Applied Technological Innovation Projects of Shandong Province (SD2019NJ014), Shandong Natural Science Foundation (ZR2019MF064) and Intelligent Robot and System Innovation Center Foundation (2019IRS19).

Author information

Authors and Affiliations

School of Control Science and Engineering, Shandong University, Jinan, People’s Republic of China
Qingqing Huang, Fengyu Zhou & Runze Qin
School of Electrical Engineering and Automation, Qilu University of Technology, Jinan, People’s Republic of China
Yang zhao

Authors

Qingqing Huang
View author publications
You can also search for this author in PubMed Google Scholar
Fengyu Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Runze Qin
View author publications
You can also search for this author in PubMed Google Scholar
Yang zhao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fengyu Zhou.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, Q., Zhou, F., Qin, R. et al. View transform graph attention recurrent networks for skeleton-based action recognition. SIViP 15, 599–606 (2021). https://doi.org/10.1007/s11760-020-01781-6

Download citation

Received: 02 April 2020
Accepted: 08 September 2020
Published: 19 September 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s11760-020-01781-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

View transform graph attention recurrent networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

MS-GTR: Multi-stream Graph Transformer for Skeleton-Based Action Recognition

Spatial Temporal Attention Graph Convolutional Networks with Mechanics-Stream for Skeleton-Based Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

View transform graph attention recurrent networks for skeleton-based action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-scale Dilated Attention Graph Convolutional Network for Skeleton-Based Action Recognition

MS-GTR: Multi-stream Graph Transformer for Skeleton-Based Action Recognition

Spatial Temporal Attention Graph Convolutional Networks with Mechanics-Stream for Skeleton-Based Action Recognition

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation