Skeleton-Based Action Quality Assessment via Partially Connected LSTM with Triplet Losses

Wang, Xinyu; Li, Jianwei; Hu, Haiqing

doi:10.1007/978-3-031-18913-5_17

Xinyu Wang¹⁵,
Jianwei Li¹⁵ &
Haiqing Hu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13536))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1740 Accesses

Abstract

Human action quality assessment (AQA) recently has attracted increasing attentions in computer vision for its practical applications, such as skill training, physical rehabilitation and scoring sports events. In this paper, we propose a partially connected LSTM with triplet losses to evaluate different skill levels. Compared to human action recognition (HAR), we explain and discuss two characteristics and countermeasures of AQA. To ignore the negative influence of complex joint movements in actions, the skeleton is not regarded as a single graph. The fully connected layer in the LSTM model is replaced by the partially connected layer, using a diagonal matrix which activates the corresponding weights, to explore hierarchical relations in the skeleton graph. Furthermore, to improve the generalization ability of models, we introduce additional functions of triplet loss to the loss function, which make samples with similar skill levels close to each other. We carry out experiments to test our model and compare it with seven LSTM architectures and three GNN architectures on the UMONS-TAICHI dataset and walking gait dataset. Experimental results demonstrate that our model achieves outstanding performance.

Supported by the Open Projects Program of National Laboratory of Pattern Recognition under Grant No. 202100009, and the Fundamental Research Funds for Central Universities No. 2021TD006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

McNally, W., Vats, K., Pinto, T., et al.: GolfDB: a video database for golf swing sequencing. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (2019)
Google Scholar
Szczęsna, A., Błaszczyszyn, M., Pawlyta, M.: Optical motion capture dataset of selected techniques in beginner and advanced Kyokushin karate athletes. Sci. Data 8(1), 1–12 (2021)
Article Google Scholar
Tits, M., Laraba, S., Caulier, E., et al.: UMONS-TAICHI: a multimodal motion capture dataset of expertise in Taijiquan gestures. Data Brief 19, 1214–1221 (2018)
Article Google Scholar
Liao, Y., Vakanski, A., Xian, M.: A deep learning framework for assessing physical rehabilitation exercises. IEEE Trans. Neural Syst. Rehabil. Eng. 28(2), 468–477 (2020)
Article Google Scholar
Capecci, M., Ceravolo, M.G., Ferracuti, F., et al.: The KIMORE dataset: KInematic assessment of MOvement and clinical scores for remote monitoring of physical REhabilitation. IEEE Trans. Neural Syst. Rehabil. Eng. 27(7), 1436–1448 (2019)
Article Google Scholar
Xu, C., Fu, Y., Zhang, B., et al.: Learning to score figure skating sport videos. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4578–4590 (2019)
Article Google Scholar
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 304–313 (2019)
Google Scholar
Parmar, P., Tran Morris, B.: Learning to score olympic events. In: Proceedings of the IEEE Conference on Computer Vision and pattern Recognition Workshops, pp. 20–28 (2017)
Google Scholar
Parmar, P., Morris, B.: Action quality assessment across multiple actions. In: 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1468–1476. IEEE (2019)
Google Scholar
Pan, J.H., Gao, J., Zheng, W.S.: Action assessment by joint relation graphs. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6331–6340 (2019)
Google Scholar
Li, H.Y., Lei, Q., Zhang, H.B., et al.: Skeleton based action quality assessment of figure skating videos. In: 2021 11th International Conference on Information Technology in Medicine and Education (ITME), pp. 196–200. IEEE (2021)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Nguyen, T.N., Huynh, H.H., Meunier, J.: 3D reconstruction with time-of-flight depth camera and multiple mirrors. IEEE Access 6, 38106–38114 (2018)
Article Google Scholar
Li, Z., Huang, Y., Cai, M., et al.: Manipulation-skill assessment from videos with spatial attention network. In: Proceedings of the IEEE/CVF International Conference on Computer 14Vision Workshops (2019)
Google Scholar
Gao, Y., Vedula, S.S., Reiley, C.E., et al.: JHU-ISI gesture and skill assessment working set (JIGSAWS): a surgical activity dataset for human motion modeling. In: MICCAI Workshop: M2CAI, vol. 3, p. 3 (2014)
Google Scholar
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In Thirty-Second AAAI Conference on Artificial In-telligence (2018)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., et al.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12026–12035 (2019)
Google Scholar
Song, S., Lan, C., Xing, J., et al.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)
Google Scholar
Chen, Y., Zhang, Z., Yuan, C., et al.: Channel-wise topology refinement graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13359–13368 (2021)
Google Scholar
Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110–1118 (2015)
Google Scholar
Thakkar, K., Narayanan, P.J.: Part-based graph convolutional network for action recognition. arXiv preprint arXiv:1809.04983 (2018)
Si, C., Chen, W., Wang, W., et al.: An attention enhanced graph convolutional LSTM network for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1227–1236 (2019)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Prétet, L., Richard, G., Peeters, G.: Learning to rank music tracks using triplet loss. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 511–515. IEEE (2020)
Google Scholar
Shi, L., Zhang, Y., Cheng, J., et al.: Skeleton-based action recognition with directed graph neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7912–7921 (2019)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Sports Engineering, Beijing Sports University, Beijing, China
Xinyu Wang, Jianwei Li & Haiqing Hu

Authors

Xinyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianwei Li
View author publications
You can also search for this author in PubMed Google Scholar
Haiqing Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianwei Li .

Editor information

Editors and Affiliations

Southern University of Science and Technology, Shenzhen, China
Shiqi Yu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zhaoxiang Zhang
Hong Kong Baptist University, Hong Kong, China
Pong C. Yuen
Northwestern Polytechnical University, Xi'an, China
Junwei Han
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hong Kong Baptist University, Hong Kong, China
Yike Guo
Sun Yat-sen University, Guangzhou, China
Jianhuang Lai
Southern University of Science and Technology, Shenzhen, China
Jianguo Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, X., Li, J., Hu, H. (2022). Skeleton-Based Action Quality Assessment via Partially Connected LSTM with Triplet Losses. In: Yu, S., et al. Pattern Recognition and Computer Vision. PRCV 2022. Lecture Notes in Computer Science, vol 13536. Springer, Cham. https://doi.org/10.1007/978-3-031-18913-5_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-18913-5_17
Published: 27 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18912-8
Online ISBN: 978-3-031-18913-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics