Abstract
Visual repetition occurs in various forms in our world, such as human activities, animal behaviors, and even natural phenomena. Visual repetition counting remains a challenging task, especially in long videos, where repetitions exhibit certain characteristics such as discontinuous actions and inconsistent cycles. The existing methods that focus on counting repetitive actions in short videos face challenges in accurately counting repetitions in long videos due to these characteristics. To tackle this challenge, we propose a multi-stride collaborative counting framework based on adaptive temporal correlation to estimate repetitions in short and long videos. Our framework predicts the final counting result based on the counting results of the same video sampled with different strides. Additionally, since existing repetition counting datasets do not adequately cover all the challenging scenarios considered in our work, we have collected and labeled a new dataset called ActCount, which includes 172 videos with approximately 1,870 annotated repetitive actions. Our dataset includes repetitions that are non-human-centric, making it more realistic and challenging. Specifically, our model outperforms all previous models on the RepCount dataset, achieving an MAE of 0.3053 and an OBO of 0.3708, setting a new state-of-the-art benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Soro, A., Brunner, G., Tanner, S.: Recognition and repetition counting for complex physical exercises with deep learning. Sensors 19(3), 714 (2019)
Xie, W., Noble, J.A., Zisserman, A.: Microscopy cell counting and detection with fully convolutional regression networks. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 6(3), 283–292 (2018)
Lu, C., Ferrier, N.J.: Repetitive motion analysis: segmentation and event classification. IEEE Trans. Pattern Anal. Mach. Intell. 26(2), 258–263 (2004)
Li, X., Singh, V., Wu, Y., Kirchberg, K., Duncan, J., Kapoor, A.: Repetitive motion estimation network: recover cardiac and respiratory signal from thoracic imaging. arXiv preprint arXiv:1811.03343 (2018)
Laptev, I., Belongie, S.J., Pérez, P., Wills, J.: Periodic motion detection and segmentation via approximate sequence alignment. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 816–823 (2005)
Belongie, S.J., Wills, J.: Structure from periodic motion. In: Spatial Coherence for Visual Motion Analysis, pp. 16–24 (2006)
Huang, S., Ying, X., Rong, J., Shang, Z., Zha., H.: Camera calibration from periodic motion of a pedestrian. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3025–3033 (2016)
Pogalin, E., Smeulders, A.W., Thean, A.H.: Visual quasi-periodicity. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8 (2008)
Runia, T.F., Snoek, C.G., Smeulders, A.W.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9009–9017 (2018)
Levy, O., Wolf, L.: Live repetition counting. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3020–3028 (2015)
Zhang, H., Xu, X., Han, G., He, S.: Context-aware and scale-insensitive temporal repetition counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 670–678 (2020)
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Counting out time: class agnostic video repetition counting in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10387–10396 (2020)
Zhang, Y., Shao, L., Snoek, C.G.: Repetitive activity counting by sight and sound. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 14070–14079 (2021)
Hu, H., Dong, S., Zhao, Y., Lian, D., Li, Z., Gao, S.: Transrac: encoding multi-scale temporal correlation with transformers for repetitive action counting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 19013–19022 (2022)
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Li, K., et al.: Uniformer: Unifying convolution and self-attention for visual recognition. arXiv preprint arXiv:2201.09450 (2022)
Kobayashi, T., Otsu, N.: Motion recognition using local auto-correlation of space-time gradients. Pattern Recogn. Lett. 33(9), 1188–1195 (2012)
Junejo, I.N., Dexter, E., Laptev, I., Perez, P.: View-independent action recognition from temporal self-similarities. IEEE Trans. Pattern Anal. Mach. Intell. 33(1), 172–185 (2010)
Vaswani, A., et al.: Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7132–7141 (2018)
Girshick, R.: Fast r-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 580–587 (2014)
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Liu, Z., et al.: Video swin transformer. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 3202–3211 (2022)
Liu, Z., Wang, L., Wu, W., Qian, C., Lu, T.: Tam: Temporal adaptive module for video recognition. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 13708–13718 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gan, G., Su, J., Wen, Z., Zhang, S. (2023). Temporal Repetition Counting Based on Multi-stride Collaboration. In: Jin, Z., Jiang, Y., Buchmann, R.A., Bi, Y., Ghiran, AM., Ma, W. (eds) Knowledge Science, Engineering and Management. KSEM 2023. Lecture Notes in Computer Science(), vol 14119. Springer, Cham. https://doi.org/10.1007/978-3-031-40289-0_24
Download citation
DOI: https://doi.org/10.1007/978-3-031-40289-0_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40288-3
Online ISBN: 978-3-031-40289-0
eBook Packages: Computer ScienceComputer Science (R0)