Improving Action Quality Assessment Using Weighted Aggregation

Farabi, Shafkat; Himel, Hasibul; Gazzali, Fakhruddin; Hasan, Md. Bakhtiar; Kabir, Md. Hasanul; Farazi, Moshiur

doi:10.1007/978-3-031-04881-4_46

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13256))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1900 Accesses

Abstract

Action quality assessment (AQA) aims at automatically judging human action based on a video of the said action and assigning a performance score to it. The majority of works in the existing literature on AQA divide RGB videos into short clips, transform these clips to higher-level representations using Convolutional 3D (C3D) networks, and aggregate them through averaging. These higher-level representations are used to perform AQA. We find that the current clip level feature aggregation technique of averaging is insufficient to capture the relative importance of clip level features. In this work, we propose a learning-based weighted-averaging technique. Using this technique, better performance can be obtained without sacrificing too much computational resources. We call this technique Weight-Decider(WD). We also experiment with ResNets for learning better representations for action quality assessment. We assess the effects of the depth and input clip size of the convolutional neural network on the quality of action score predictions. We achieve a new state-of-the-art Spearman’s rank correlation of 0.9315 (an increase of 0.45%) on the MTL-AQA dataset using a 34 layer (2+1)D ResNet with the capability of processing 32 frame clips, with WD aggregation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Assessing Action Quality via Attentive Spatio-Temporal Convolutional Networks

Assessing action quality with semantic-sequence performance regression and densely distributed sample weighting

Article 29 February 2024

End-To-End Learning for Action Quality Assessment

Notes

1.
Weights available at: https://github.com/kenshohara/3D-ResNets-PyTorch.
2.
Weights available at: https://github.com/moabitcoin/ig65m-pytorch.

References

Carreira, J., Zisserman, A.: Quo vadis, action recognition? A new model and the kinetics dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 4724–4733. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.502
Diba, A., et al.: Spatio-temporal channel correlation networks for action classification. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 299–315. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_18
Chapter Google Scholar
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1217–1225 (2019). https://doi.org/10.1007/s11548-019-01995-1
Article Google Scholar
Ghadiyaram, D., Tran, D., Mahajan, D.: Large-scale weakly-supervised pre-training for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 12046–12055. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.01232
Hara, K., Kataoka, H., Satoh, Y.: Learning spatio-temporal features with 3d residual networks for action recognition. In: 2017 IEEE International Conference on Computer Vision Workshops, ICCV Workshops 2017, Venice, Italy, 22–29 October 2017, pp. 3154–3160. IEEE Computer Society (2017). https://doi.org/10.1109/ICCVW.2017.373
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778. IEEE Computer Society (2016). https://doi.org/10.1109/CVPR.2016.90
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.: Large-scale video classification with convolutional neural networks. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, Columbus, OH, USA, 23–28 June 2014, pp. 1725–1732. IEEE Computer Society (2014). https://doi.org/10.1109/CVPR.2014.223
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 abs/1705.06950 (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: Bengio, Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015, Conference Track Proceedings (2015). http://arxiv.org/abs/1412.6980
LeCun, Y., Bengio, Y.: Convolutional Networks for Images, Speech, and Time Series, pp. 255–258. MIT Press, Cambridge (1998). https://doi.org/10.5555/303568.303704
Leong, M., Prasad, D., Lee, Y.T., Lin, F.: Semi-CNN architecture for effective spatio-temporal learning in action recognition. Appl. Sci. 10, 557 (2020). https://doi.org/10.3390/app10020557
Parmar, P., Morris, B.T.: Measuring the quality of exercises. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 2241–2244 (2016). https://doi.org/10.1109/EMBC.2016.7591175
Parmar, P., Morris, B.: Action quality assessment across multiple actions. In: IEEE Winter Conference on Applications of Computer Vision, WACV 2019, Waikoloa Village, HI, USA, 7–11 January 2019, pp. 1468–1476. IEEE (2019). https://doi.org/10.1109/WACV.2019.00161
Parmar, P., Morris, B.T.: Learning to score Olympic events. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2017, Honolulu, HI, USA, 21–26 July 2017, pp. 76–84. IEEE Computer Society (2017). https://doi.org/10.1109/CVPRW.2017.16
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, 16–20 June 2019, pp. 304–313. Computer Vision Foundation/IEEE (2019). https://doi.org/10.1109/CVPR.2019.00039
Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Workshop on Autodiff. Long Beach, California, USA (2017). https://openreview.net/forum?id=BJJsrmfCZ
Pirsiavash, H., Vondrick, C., Torralba, A.: Assessing the quality of actions. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 556–571. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_36
Chapter Google Scholar
Tang, Y., et al.: Uncertainty-aware score distribution learning for action quality assessment. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020, pp. 9836–9845. IEEE (2020). https://doi.org/10.1109/CVPR42600.2020.00986
Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, 7–13 December 2015, pp. 4489–4497. IEEE Computer Society (2015). https://doi.org/10.1109/ICCV.2015.510
Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, 18–22 June 2018, pp. 6450–6459. IEEE Computer Society (2018). https://doi.org/10.1109/CVPR.2018.00675
Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3D: stacking segmental P3D for action quality assessment. In: 2018 IEEE International Conference on Image Processing, ICIP 2018, Athens, Greece, 7–10 October 2018, pp. 928–932. IEEE (2018). https://doi.org/10.1109/ICIP.2018.8451364

Download references

Author information

Authors and Affiliations

Islamic University of Technology, Gazipur, Bangladesh
Shafkat Farabi, Hasibul Himel, Fakhruddin Gazzali, Md. Bakhtiar Hasan & Md. Hasanul Kabir
Data61-CSIRO, Canberra, Australia
Moshiur Farazi

Authors

Shafkat Farabi
View author publications
You can also search for this author in PubMed Google Scholar
Hasibul Himel
View author publications
You can also search for this author in PubMed Google Scholar
Fakhruddin Gazzali
View author publications
You can also search for this author in PubMed Google Scholar
Md. Bakhtiar Hasan
View author publications
You can also search for this author in PubMed Google Scholar
Md. Hasanul Kabir
View author publications
You can also search for this author in PubMed Google Scholar
Moshiur Farazi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shafkat Farabi .

Editor information

Editors and Affiliations

University of Aveiro, Aveiro, Portugal
Armando J. Pinho
University of Aveiro, Aveiro, Portugal
Petia Georgieva
University of Porto, Porto, Portugal
Luís F. Teixeira
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Farabi, S., Himel, H., Gazzali, F., Hasan, M.B., Kabir, M.H., Farazi, M. (2022). Improving Action Quality Assessment Using Weighted Aggregation. In: Pinho, A.J., Georgieva, P., Teixeira, L.F., Sánchez, J.A. (eds) Pattern Recognition and Image Analysis. IbPRIA 2022. Lecture Notes in Computer Science, vol 13256. Springer, Cham. https://doi.org/10.1007/978-3-031-04881-4_46

Download citation

DOI: https://doi.org/10.1007/978-3-031-04881-4_46
Published: 26 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04880-7
Online ISBN: 978-3-031-04881-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)