Class Agnostic, On-Device and Privacy Preserving Repetition Counting of Actions from Videos Using Similarity Bottleneck

Khurana, Rishabh; Vachhani, Jayesh Rajkumar; Rakshith, S; Gothe, Sourabh Vasant

doi:10.1007/978-3-031-31417-9_7

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1777))

Included in the following conference series:

International Conference on Computer Vision and Image Processing

440 Accesses

Abstract

We present a practical, privacy-preserving on-device method to get the repetition count of an action in a given video stream. Our approach relies on calculating the pairwise similarity between each sampled frame of the video, using the per frame features extracted by the feature extraction module and a suitable distance metric in the temporal self-similarity(TSM) calculation module. We pass this calculated TSM matrix to the count prediction module to arrive at the repetition count of the action in the given video. The count prediction module is deliberately designed to not pay any attention to the extracted per frame features which are video specific. This self-similarity bottleneck enables the model to be class agnostic and allows generalization to actions not observed during training. We utilize the largest available dataset for repetition counting, Countix, for training and evaluation. We also propose a way for effectively augmenting the training data in Countix. Our experiments show SOTA comparable accuracies with significantly smaller model footprints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Levy, O., Wolf, L.: Live repetition counting. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3020–3028 (2015)
Google Scholar
Runia, T.F.H., Snoek, C.G.M., Smeulders, A.W.M.: Real-world repetition estimation by div, grad and curl. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9009–9017 (2018)
Google Scholar
Zhang, H., Xu, X., Han, G., He, S.: Context-aware and scale-insensitive temporal repetition counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 670–678 (2020)
Google Scholar
Zhang, Y., Shao, L., Snoek, C.G.M.: Repetitive activity counting by sight and sound. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14070–14079 (2021)
Google Scholar
Hu, H., Dong, A., Zhao, Y., Lian, D., Li, Z., Gao, S.: TransRAC: encoding multi-scale temporal correlation with transformers for repetitive action counting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19013–19022 (2022)
Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Carreira, J., Zisserman, A.: Quo vadis, action recognition? a new model and the kinetics dataset. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6299–6308 (2017)
Google Scholar
Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: Counting out time: class agnostic video repetition counting in the wild. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10387–10396 (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Howard, A.G., et al.: Mobilenets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Ran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A closer look at spatiotemporal convolutions for action recognition. In: Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 6450–6459 (2018)
Google Scholar
Cutler, R., Davis, L.S.: Robust real-time periodic motion detection, analysis, and applications. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 781–796 (2000)
Article Google Scholar
Körner, M., Denzler, J.: Temporal self-similarity for appearance-based action recognition in multi-view setups. In: Wilson, R., Hancock, E., Bors, A., Smith, W. (eds.) CAIP 2013. LNCS, vol. 8047, pp. 163–171. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40261-6_19
Chapter Google Scholar
Dwibedi, D., Misra, I., Hebert, M.: Cut, paste and learn: surprisingly easy synthesis for instance detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1301–1310 (2017)
Google Scholar
Tremblay, J., et al.: Training deep networks with synthetic data: Bridging the reality gap by domain randomization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 969–977 (2018)
Google Scholar
Kang, H., Kim, J., Kim, K., Kim, T., Kim, S.J.: Winning the CVPR’2021 Kinetics-GEBD challenge: contrastive learning approach. arXiv preprint arXiv:2106.11549 (2021)
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Science Focus. https://www.sciencefocus.com/future-technology/how-much-data-is-on-the-internet/. Accessed 13 July 2022

Download references

Author information

Authors and Affiliations

Samsung R &D Institute, Bangalore, 560037, India
Rishabh Khurana, Jayesh Rajkumar Vachhani, S Rakshith & Sourabh Vasant Gothe

Authors

Rishabh Khurana
View author publications
You can also search for this author in PubMed Google Scholar
Jayesh Rajkumar Vachhani
View author publications
You can also search for this author in PubMed Google Scholar
S Rakshith
View author publications
You can also search for this author in PubMed Google Scholar
Sourabh Vasant Gothe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Rishabh Khurana , Jayesh Rajkumar Vachhani or S Rakshith .

Editor information

Editors and Affiliations

Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Deep Gupta
Visvesvaraya National Institute of Technology Nagpur, Nagpur, India
Kishor Bhurchandi
Indian Institute of Technology Ropar, Rupnagar, India
Subrahmanyam Murala
Indian Institute of Technology Roorkee, Roorkee, India
Balasubramanian Raman
Indian Institute of Technology Roorkee, Roorkee, India
Sanjeev Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khurana, R., Vachhani, J.R., Rakshith, S., Gothe, S.V. (2023). Class Agnostic, On-Device and Privacy Preserving Repetition Counting of Actions from Videos Using Similarity Bottleneck. In: Gupta, D., Bhurchandi, K., Murala, S., Raman, B., Kumar, S. (eds) Computer Vision and Image Processing. CVIP 2022. Communications in Computer and Information Science, vol 1777. Springer, Cham. https://doi.org/10.1007/978-3-031-31417-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-31417-9_7
Published: 07 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31416-2
Online ISBN: 978-3-031-31417-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Class Agnostic, On-Device and Privacy Preserving Repetition Counting of Actions from Videos Using Similarity Bottleneck