Golf Guided Grad-CAM: attention visualization within golf swings via guided gradient-based class activation mapping

Jiao, Libin; Gao, Wenchao; Bie, Rongfang; Umek, Anton; Kos, Anton

doi:10.1007/s11042-023-17153-4

Golf Guided Grad-CAM: attention visualization within golf swings via guided gradient-based class activation mapping

Published: 05 October 2023

Volume 83, pages 38481–38503, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

164 Accesses
Explore all metrics

Abstract

Convolutional neural network (CNN)-based methods facilitate data classification but sacrifice physical interpretability due to the complex model architecture and tight inferring integration. The interpretability requirement of our prior CNN-based golf classifier motivates us to explain the performance of the predictions and to discover the class-discriminative, significant regions of interest within the golf swings as well. This can be done by casting the 2D Guided Grad-CAMs to a 1D generalization, which is presented in our current research. We then perform the visualization by inspecting the golf predictions and the involved golf dataset using such a custom 1D Guided Grad-CAM, highlight class-discriminative, significant regions of interest, and finally attempt to present potential interpretations. Specifically, we investigate the attention performance and the corresponding potential attributions by visualizing and by evaluating the predictions given by the classifier and the golf swings from five perspectives, including attention consistency within particular classes, the inspections of misclassified swings, Guided Grad-CAM visualizations at different layers, and the attention shift with respect to temporal resolutions and with respect to sensor usages. We conclude that our visual inspections explain our previous classification performance, that the class-discriminative, significant features can be captured, and that every single prediction has its reasonable interpretation, in terms of the comprehensive experiments. Such exploration can provide a potential possibility of associating the critical regions and features with the physical movements of golf players, which can possibly contribute to golf training. Relevant code files are available at https://github.com/92xianshen/golf-guided-gradcam.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Golf Swing Sequencing Using Computer Vision

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Article 22 September 2023

Fast Neural Accumulator (NAC) Based Badminton Video Action Classification

References

Antol S, Agrawal A, Lu J et al (2015) Vqa: Visual question answering. In: Proceedings of the IEEE international conference on computer vision, pp 2425–2433
Arjovsky M, Bottou L (2017) Towards principled methods for training generative adversarial networks. In: 5th international conference on learning representations (ICLR)
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan. arXiv:1701.07875
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Chambers R, Gabbett TJ, Cole MH et al (2015) The use of wearable microsensors to quantify sport-specific movements. Sports Med 45:1065–1081
Article Google Scholar
Chattopadhay A, Sarkar A, Howlader P et al (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE winter conference on applications of computer vision (WACV). IEEE, pp 839–847
Chen X, Fang H, Lin TY et al (2015) Microsoft coco captions: data collection and evaluation server. arXiv:1504.00325
Dosovitskiy A, Beyer L, Kolesnikov A et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Gao H, Mao J, Zhou J et al (2015) Are you talking to a machine? Dataset and methods for multilingual image question. Adv Neural Inf Process Sys 28
Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
He K, Zhang X, Ren S et al (2016a) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
He K, Zhang X, Ren S et al (2016b) Identity mappings in deep residual networks. In: Proceedings of the european conference on computer vision (ECCV), pp 630–645
He K, Gkioxari G, Dollár P et al (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Howard A, Sandler M, Chu G et al (2019) Searching for mobilenetv3. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 1314–1324
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861
Hsu YL, Chen YT, Chou PH et al (2016) Golf swing motion detection using an inertial-sensor-based portable instrument. In: 2016 IEEE international conference on consumer electronics-Taiwan (ICCE-TW). IEEE, pp 1–2
Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 4700–4708
Jiao L, Bie R, Wu H et al (2018) Golf swing classification with multiple deep convolutional neural networks. Int J Distrib Sens Netw 14(10):1550147718802186
Article Google Scholar
Jiao L, Wu H, Bie R et al (2018) Towards real-time multi-sensor golf swing classification using deep cnns. J Data Manage (JDM) 29(3):17–42
Google Scholar
Johnson J, Karpathy A, Fei-Fei L (2016) Densecap: fully convolutional localization networks for dense captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4565–4574
Kos A, Umek A (2018) Biomechanical biofeedback systems and applications. Springer
Book Google Scholar
Lightman K (2016) Silicon gets sporty. IEEE Spectrum 53(3):48–53
Article Google Scholar
Liu Z, Hu H, Lin Y et al (2021a) Swin transformer v2: scaling up capacity and resolution. arXiv:2111.09883
Liu Z, Lin Y, Cao Y et al (2021b) Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10012–10022
Miyato T, Kataoka T, Koyama M et al (2018) Spectral normalization for generative adversarial networks. In: 2018 international conference on learning representations (ICLR)
Omeiza D, Speakman S, Cintas C et al (2019) Smooth grad-cam++: an enhanced inference level visualization technique for deep convolutional neural network models. arXiv:1908.01224
Ren M, Kiros R, Zemel R (2015) Exploring models and data for image question answering. Adv Neural Inf Process Sys 28
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Proceedings of the medical image computing and computer-assisted intervention (MICCAI), pp 234–241
Russakovsky O, Deng J, Su H et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Sandler M, Howard A, Zhu M et al (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4510–4520
Selvaraju RR, Cogswell M, Das A et al (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp 18–626
Shelhamer E, Long J, Darrell T (2016) Fully convolutional networks for semantic segmentation. IEEE Trans Patt Anal Mach Intell 39(4):640–651
Article Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
Simonyan K, Vedaldi A, Zisserman A (2014) Deep inside convolutional networks: visualising image classification models and saliency maps. In: In workshop at international conference on learning representations, citeseer
Springenberg JT, Dosovitskiy A, Brox T et al (2014) Striving for simplicity: the all convolutional net. arXiv:1412.6806
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: Proceedings of the international conference on machine learning (ICML), pp 6105–6114
Tolstikhin IO, Houlsby N, Kolesnikov A et al (2021) Mlp-mixer: an all-mlp architecture for vision. Adv Neural Inf Process Sys 34
Umek A, Kos A (2016) The role of high performance computing and communication for real-time biofeedback in sport. Math Probl Eng 2016
Yu G, Jang YJ, Kim J et al (2016) Potential of imu sensors in performance analysis of professional alpine skiers. Sensors 16(4):463
Article Google Scholar
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818–833
Zhou B, Khosla A, Lapedriza A et al (2016) Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2921–2929

Download references

Acknowledgements

This research was supported in part by the China-CEEC Higher Education Institutions Joint Educational Program 2022 “Cooperative research on intelligent action recognition model and interpretability for biofeedback system” under Grant 2022146, in part by the Fundamental Research Funds for the Central Universities under Grant 2022XJJD02, in part by Slovenian Research Agency within the Research Program “ICT4QoL-Information and Communications Technologies for Quality of Life” under Grant no. P2-0246, and in part by the Bilateral Project between Slovenia and China titled “Machine learning methods in real-time biofeedback systems”. We would like to thank Prof. Dr. Sašo Tomažič for his significant contributions.

Author information

Authors and Affiliations

School of Artificial Intelligence, China University of Mining and Technology-Beijing, Ding No. 11 Xueyuan Road, Beijing, 100083, China
Libin Jiao & Wenchao Gao
School of Artificial Intelligence, Beijing Normal University, No.19, Xinjiekouwai St., Beijing, 100875, China
Rongfang Bie
Faculty of Electrical Engineering, University of Ljubljana, Tržaška cesta 25, Ljubljana, 1000, Slovenia
Anton Umek & Anton Kos

Authors

Libin Jiao
View author publications
You can also search for this author in PubMed Google Scholar
Wenchao Gao
View author publications
You can also search for this author in PubMed Google Scholar
Rongfang Bie
View author publications
You can also search for this author in PubMed Google Scholar
Anton Umek
View author publications
You can also search for this author in PubMed Google Scholar
Anton Kos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Libin Jiao or Rongfang Bie.

Ethics declarations

Competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Jiao, L., Gao, W., Bie, R. et al. Golf Guided Grad-CAM: attention visualization within golf swings via guided gradient-based class activation mapping. Multimed Tools Appl 83, 38481–38503 (2024). https://doi.org/10.1007/s11042-023-17153-4

Download citation

Received: 27 June 2022
Revised: 25 May 2023
Accepted: 15 September 2023
Published: 05 October 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s11042-023-17153-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Golf Guided Grad-CAM: attention visualization within golf swings via guided gradient-based class activation mapping

Abstract

Access this article

Similar content being viewed by others

Golf Swing Sequencing Using Computer Vision

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Fast Neural Accumulator (NAC) Based Badminton Video Action Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Golf Guided Grad-CAM: attention visualization within golf swings via guided gradient-based class activation mapping

Abstract

Access this article

Similar content being viewed by others

Golf Swing Sequencing Using Computer Vision

Hybrid LSTM and GAN model for action recognition and prediction of lawn tennis sport activities

Fast Neural Accumulator (NAC) Based Badminton Video Action Classification

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Competing interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation