The detection, tracking, and temporal action localisation of swimmers for automated analysis

Hall, Ashley; Victor, Brandon; He, Zhen; Langer, Matthias; Elipot, Marc; Nibali, Aiden; Morgan, Stuart

doi:10.1007/s00521-020-05485-3

The detection, tracking, and temporal action localisation of swimmers for automated analysis

Original Article
Published: 18 November 2020

Volume 33, pages 7205–7223, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Ashley Hall¹,
Brandon Victor¹,
Zhen He ORCID: orcid.org/0000-0003-0302-5775¹,
Matthias Langer²,
Marc Elipot³,
Aiden Nibali¹ &
…
Stuart Morgan⁴

675 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

It is very important for swimming coaches to analyse a swimmer’s performance at the end of each race, since the analysis can then be used to change strategies for the next round. Coaches rely heavily on statistics, such as stroke length and instantaneous velocity, when analysing performance. These statistics are usually derived from time-consuming manual video annotations. To automatically obtain the required statistics from swimming videos, we need to solve the following four challenging computer vision tasks: swimmer head detection; tracking; stroke detection; and camera calibration. We collectively solve these problems using a two-phased deep learning approach, we call Deep Detector for Actions and Swimmer Heads (DeepDASH). DeepDASH achieves a 20.8% higher F1 score for swimmer head detection and operates 6 times faster than the popular Faster R-CNN object detector. We also propose a hierarchical tracking algorithm based on the existing SORT algorithm which we call HISORT. HISORT produces significantly longer tracks than SORT by preserving swimmer identities for longer periods of time. Finally, DeepDASH achieves an overall F1 score of 97.5% for stroke detection across all four swimming stroke styles.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

A review of object detection based on deep learning

Article 12 June 2020

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Notes

The convolution layers preceding this operation are not completely translationally equivariant due to image boundary effects, but a fully connected layer only exacerbates this problem.

References

Coco: Common obejcts in context. http://cocodataset.org/. Accessed 22 Nov 2019
Multiple object tracking benchmark. https://motchallenge.net/
Bernardin K, Stiefelhagen R (2008) Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP J Image Video Process 2008(1):246309. https://doi.org/10.1155/2008/246309
Article Google Scholar
Bewley A, Ge Z, Ott L, Ramos F, Upcroft B (2016) Simple online and realtime tracking. In: 2016 IEEE international conference on image processing (ICIP), pp 3464–3468. IEEE
Buch S, Escorcia V, Ghanem B, Fei-Fei L, Niebles JC (2017) End-to-end, single-stream temporal action detection in untrimmed videos. In: BMVC, vol 2, p 7
Caba Heilbron F, Carlos Niebles J, Ghanem B (2016) Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1914–1923
Chao YW, Vijayanarasimhan S, Seybold B, Ross DA, Deng J, Sukthankar R (2018) Rethinking the faster r-cnn architecture for temporal action localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1130–1139
Dai X, Singh B, Zhang G, Davis LS, Qiu Chen Y (2017) Temporal context network for activity localization in videos. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5793–5802
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: Keypoint triplets for object detection. In: Proceedings of international conference in computer vision (ICCV)
Einfalt M, Zecha D, Lienhart R (2018) Activity-conditioned continuous human pose estimation for performance analysis of athletes using the example of swimming. In: 2018 IEEE winter conference on applications of computer vision (WACV), pp 446–455. IEEE
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2012) The PASCAL visual object classes challenge (VOC2012) results. http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Fani H, Mirlohi A, Hosseini H, Herperst R (2018) Swim stroke analytic: front crawl pulling pose classification. In: 2018 25th IEEE international conference on image processing (ICIP), pp 4068–4072. IEEE
Gao J, Yang Z, Chen K, Sun C, Nevatia R (2017) Turn tap: temporal unit regression network for temporal action proposals. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 3628–3636
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 580–587
Hakozaki K, Kato N, Tanabiki M, Furuyama J, Sato Y, Aoki Y (2018) Swimmer’s stroke estimation using cnn and multilstm. J Sig Process 22(4):219–222
Article Google Scholar
Hammad M, Pławiak P, Wang K, Acharya UR (2020) Resnet-attention model for human authentication using ECG signals. Expert Syst p e12547
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang Y, Dai Q, Lu Y (2019) Decoupling localization and classification in single shot temporal action detection. In: IEEE international conference on multimedia and expo (ICME)
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Kuhn HW (1955) The hungarian method for the assignment problem. Nav Res Log Quart 2(1–2):83–97
Article MathSciNet Google Scholar
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750
Leal-Taixé L, Fenzi M, Kuznetsova A, Rosenhahn B, Savarese S (2014) Learning an image-based motion context for multiple people tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3542–3549
Leal-Taixé L, Milan A, Reid I, Roth S, Schindler K (2015) Motchallenge 2015: Towards a benchmark for multi-target tracking. arXiv preprint arXiv:1504.01942
Leal-Taixé L, Pons-Moll G, Rosenhahn B (2011) Everybody needs somebody: modeling social and grouping behavior on a linear programming multiple people tracker. In: 2011 IEEE international conference on computer vision workshops (ICCV workshops), pp 120–127. IEEE
Lin T, Zhao X, Shou Z (2017) Single shot temporal action detection. In: Proceedings of the 25th ACM international conference on multimedia, pp 988–996. ACM
Lin T, Zhao X, Su H, Wang C, Yang M (2018) BSN: boundary sensitive network for temporal action proposal generation. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 2117–2125
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2980–2988
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer
Nibali A, He Z, Morgan S, Greenwood D (2017) Extraction and classification of diving clips from continuous video footage. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 38–48
Nibali A, He Z, Morgan S, Prendergast L (2018) Numerical coordinate regression with convolutional neural networks. arXiv preprint arXiv:1801.07372
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Pirsiavash H, Ramanan D, Fowlkes CC (2011) Globally-optimal greedy algorithms for tracking a variable number of objects. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1201–1208. IEEE
Pławiak P, Abdar M, Acharya UR (2019) Application of new deep genetic cascade ensemble of svm classifiers to predict the australian credit scoring. Appl Soft Comput 84:105740
Article Google Scholar
Pławiak P, Abdar M, Pławiak J, Makarenkov V, Acharya UR (2020) Dghnl: a new deep genetic hierarchical network of learners for prediction of credit scoring. Inf Sci 516:401–418
Article Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), pp 7263–7271
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C (2016) Performance measures and a data set for multi-target, multi-camera tracking. In: Computer vision—ECCV 2016 workshops, pp 17–35. Springer International Publishing, Cham
Sadeghian A, Alahi A, Savarese S (2017) Tracking the untrackable: Learning to track multiple cues with long-term dependencies. In: Proceedings of the IEEE international conference on computer vision (CVPR), pp 300–311
Shou Z, Chan J, Zareian A, Miyazawa K, Chang SF (2017) Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 5734–5743
Shou Z, Wang D, Chang SF (2016) Temporal action localization in untrimmed videos via multi-stage cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 1049–1058
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of international conference in computer vision (ICCV)
Tompson JJ, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In: Advances in neural information processing systems, pp 1799–1807
Tsumita T, Shishido H, Kitahara I, Kameda Y (2019) Swimmer position estimation by lane rectification. In: International workshop on advanced image technology (IWAIT) 2019, vol 11049, p 110490E. International Society for Optics and Photonics
Tuncer T, Ertam F, Dogan S, Aydemir E, Pławiak P (2020) Ensemble residual network-based gender and activity recognition method with signals. J Supercomput 76(3):2119–2138
Article Google Scholar
Victor B, He Z, Morgan S, Miniutti D (2017) Continuous video to simple signals for swimming stroke detection with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 66–75
Wang M, Liu Y, Huang Z (2017) Large margin object tracking with circulant feature maps. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4021–4029
Wojke N, Bewley A, Paulus D (2017) Simple online and realtime tracking with a deep association metric. In: 2017 IEEE international conference on image processing (ICIP), pp 3645–3649. IEEE
Wu Y, He K (2018) Group normalization. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19
Xu H, Das A, Saenko K (2017) R-c3d: region convolutional 3d network for temporal activity detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 5783–5792
Zecha D, Einfalt M, Lienhart R (2019) Refining joint locations for human pose tracking in sports videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 0–0
Zhang L, Li Y, Nevatia R (2008) Global data association for multi-object tracking using network flows. In: 2008 IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8. IEEE
Zhao Y, Xiong Y, Wang L, Wu Z, Tang X, Lin D (2017) Temporal action detection with structured segment networks. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 2914–2923
Zhu J, Yang H, Liu N, Kim M, Zhang W, Yang MH (2018) Online multi-object tracking with dual matching attention networks. In: Proceedings of the European conference on computer vision (ECCV), pp 366–382
Zisserman RHA (2004) Multiple view geometry in computer vision. Cambridge University Press, Cambridge
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, La Trobe University, Bundoora, Australia
Ashley Hall, Brandon Victor, Zhen He & Aiden Nibali
Career Science Lab (CSL), BOSS ZhiPin, Metzingen, Germany
Matthias Langer
Swimming Australia, Canberra, Australia
Marc Elipot
Australian Institute of Sport, Canberra, Australia
Stuart Morgan

Authors

Ashley Hall
View author publications
You can also search for this author in PubMed Google Scholar
Brandon Victor
View author publications
You can also search for this author in PubMed Google Scholar
Zhen He
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Langer
View author publications
You can also search for this author in PubMed Google Scholar
Marc Elipot
View author publications
You can also search for this author in PubMed Google Scholar
Aiden Nibali
View author publications
You can also search for this author in PubMed Google Scholar
Stuart Morgan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhen He.

Ethics declarations

Conflict of interest

This work was funded by a competitive innovation fund from the Australian Institute of Sports. The Project is titled “A software system for automated annotation of swimming videos using deep learning”. There are no other conflicts to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the Australian Institute of Sports, Swimming Australia and Optus for providing the research innovation grant used to carry out this research.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hall, A., Victor, B., He, Z. et al. The detection, tracking, and temporal action localisation of swimmers for automated analysis. Neural Comput & Applic 33, 7205–7223 (2021). https://doi.org/10.1007/s00521-020-05485-3

Download citation

Received: 17 December 2019
Accepted: 27 October 2020
Published: 18 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00521-020-05485-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The detection, tracking, and temporal action localisation of swimmers for automated analysis

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The detection, tracking, and temporal action localisation of swimmers for automated analysis

Abstract

Access this article

Similar content being viewed by others

A review of object detection based on deep learning

Convolutional neural network: a review of models, methodologies and applications to object detection

Video summarization using deep learning techniques: a detailed analysis and investigation

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation