Temporal Repetition Counting Based on Multi-stride Collaboration

Gan, Guoxi; Su, Jia; Wen, Zonghui; Zhang, Shenmeng

doi:10.1007/978-3-031-40289-0_24

Guoxi Gan¹³,
Jia Su¹³,
Zonghui Wen¹³ &
…
Shenmeng Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14119))

Included in the following conference series:

International Conference on Knowledge Science, Engineering and Management

653 Accesses

Abstract

Visual repetition occurs in various forms in our world, such as human activities, animal behaviors, and even natural phenomena. Visual repetition counting remains a challenging task, especially in long videos, where repetitions exhibit certain characteristics such as discontinuous actions and inconsistent cycles. The existing methods that focus on counting repetitive actions in short videos face challenges in accurately counting repetitions in long videos due to these characteristics. To tackle this challenge, we propose a multi-stride collaborative counting framework based on adaptive temporal correlation to estimate repetitions in short and long videos. Our framework predicts the final counting result based on the counting results of the same video sampled with different strides. Additionally, since existing repetition counting datasets do not adequately cover all the challenging scenarios considered in our work, we have collected and labeled a new dataset called ActCount, which includes 172 videos with approximately 1,870 annotated repetitive actions. Our dataset includes repetitions that are non-human-centric, making it more realistic and challenging. Specifically, our model outperforms all previous models on the RepCount dataset, achieving an MAE of 0.3053 and an OBO of 0.3708, setting a new state-of-the-art benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Soro, A., Brunner, G., Tanner, S.: Recognition and repetition counting for complex physical exercises with deep learning. Sensors 19(3), 714 (2019)
Article Google Scholar
Xie, W., Noble, J.A., Zisserman, A.: Microscopy cell counting and detection with fully convolutional regression networks. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 6(3), 283–292 (2018)
Google Scholar
Lu, C., Ferrier, N.J.: Repetitive motion analysis: segmentation and event classification. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 258–263 (2004)
Article Google Scholar
Li, X., Singh, V., Wu, Y., Kirchberg, K., Duncan, J., Kapoor, A.: Repetitive motion estimation network: recover cardiac and respiratory signal from thoracic imaging. arXiv preprint arXiv:1811.03343 (2018)
Laptev, I., Belongie, S.J., Pérez, P., Wills, J.: Periodic motion detection and segmentation via approximate sequence alignment. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 816–823 (2005)
Google Scholar
Belongie, S.J., Wills, J.: Structure from periodic motion. In: Spatial Coherence for Visual Motion Analysis, pp. 16–24 (2006)
Google Scholar
Huang, S., Ying, X., Rong, J., Shang, Z., Zha., H.: Camera calibration from periodic motion of a pedestrian. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3025–3033 (2016)
Google Scholar
Pogalin, E., Smeulders, A.W., Thean, A.H.: Visual quasi-periodicity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Google Scholar
Runia, T.F., Snoek, C.G., Smeulders, A.W.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9009–9017 (2018)
Google Scholar
Levy, O., Wolf, L.: Live repetition counting. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3020–3028 (2015)
Google Scholar
Zhang, H., Xu, X., Han, G., He, S.: Context-aware and scale-insensitive temporal repetition counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 670–678 (2020)
Google Scholar
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Counting out time: class agnostic video repetition counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10387–10396 (2020)
Google Scholar
Zhang, Y., Shao, L., Snoek, C.G.: Repetitive activity counting by sight and sound. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14070–14079 (2021)
Google Scholar
Hu, H., Dong, S., Zhao, Y., Lian, D., Li, Z., Gao, S.: Transrac: encoding multi-scale temporal correlation with transformers for repetitive action counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19013–19022 (2022)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Li, K., et al.: Uniformer: Unifying convolution and self-attention for visual recognition. arXiv preprint arXiv:2201.09450 (2022)
Kobayashi, T., Otsu, N.: Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn. Lett. 33(9), 1188–1195 (2012)
Article Google Scholar
Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 172–185 (2010)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Google Scholar
Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, Z., et al.: Video swin transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3202–3211 (2022)
Google Scholar
Liu, Z., Wang, L., Wu, W., Qian, C., Lu, T.: Tam: Temporal adaptive module for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 13708–13718 (2021)
Google Scholar

Download references

Author information

Authors and Affiliations

Capital Normal University, Beijing, China
Guoxi Gan, Jia Su, Zonghui Wen & Shenmeng Zhang

Authors

Guoxi Gan
View author publications
You can also search for this author in PubMed Google Scholar
Jia Su
View author publications
You can also search for this author in PubMed Google Scholar
Zonghui Wen
View author publications
You can also search for this author in PubMed Google Scholar
Shenmeng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jia Su .

Editor information

Editors and Affiliations

Peking University, Beijing, China
Zhi Jin
South China Normal University, Guangzhou, China
Yuncheng Jiang
Babeș-Bolyai University, Cluj-Napoca, Romania
Robert Andrei Buchmann
Ulster University, Belfast, UK
Yaxin Bi
Babeș-Bolyai University, Cluj-Napoca, Romania
Ana-Maria Ghiran
South China Normal University, Guangzhou, China
Wenjun Ma

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gan, G., Su, J., Wen, Z., Zhang, S. (2023). Temporal Repetition Counting Based on Multi-stride Collaboration. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14119. Springer, Cham. https://doi.org/10.1007/978-3-031-40289-0_24

Download citation

DOI: https://doi.org/10.1007/978-3-031-40289-0_24
Published: 09 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40288-3
Online ISBN: 978-3-031-40289-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Temporal Repetition Counting Based on Multi-stride Collaboration