Skip to main content
Log in

Improved use of descriptors for early recognition of actions in video

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Action recognition is a popular research topic in the computer vision community. A new trend has emerged in this field which seeks to recognise the action with as few frames as possible, called early action recognition. Visual bag-of-words methods that rely on local descriptors and visual words are one of the tools that have been used in both offline and early action recognition. In this paper, we propose an improvement to bag-of-words approaches by means of what we name patterns, i.e. co-occurrences of visual words. We compare our method with basic bag-of-words. Experiments on benchmark datasets suggest that our method achieves better accuracy than simple bag-of-words. Also, our method performs better than some of the state of the art methods at some observation ratios. Furthermore, some methods proposed in the literature require segments or video partitions as their working unit. Our method, however, is more granular and can update its prediction as soon as a new descriptor arrives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Cao Y, Barrett D, Barbu A, Narayanaswamy S, Yu H, Michaux A, Lin Y, Dickinson S, Siskind J M, Wang S (2013) Recognize human activities from partially observed videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2013.343, pp 2658–2665

  2. Dalal N, Triggs B, Schmid C (2006) Human detection using oriented histograms of flow and appearance. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/11744047_33, pp 428–441

  3. Dollár P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In: Proceedings - 2nd Joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance, VS-PETS. https://doi.org/10.1109/VSPETS.2005.1570899, vol 2005, pp 65–72

  4. Hassan M, Atieh M (2015) Action prediction in smart home based on reinforcement learning. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-14424-5_22, vol 8456. Springer, pp 207–212

  5. Kantorov V, Laptev I (2014) Efficient feature extraction, encoding, and classification for action recognition. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1109/CVPR.2014.332, pp 2593–2600

  6. Khan M A, Javed K, Khan S A, Saba T, Habib U, Khan J A, Abbasi A A (2020) Human action recognition using fusion of multiview and deep features: an application to video surveillance. Multimedia Tools and Applications, 1–27. https://doi.org/10.1007/s11042-020-08806-9

  7. Khan M A, Sharif M, Akram T, Raza M, Saba T, Rehman A (2020) Hand-crafted and deep convolutional neural network features fusion and selection strategy: an application to intelligent human action recognition. Appl Soft Comput J 87:105986. https://doi.org/10.1016/j.asoc.2019.105986

    Article  Google Scholar 

  8. Kong Y, Fu Y (2016) Max-margin action prediction machine. IEEE Trans Pattern Anal Mach Intell 38(9):1844–1858. https://doi.org/10.1109/TPAMI.2015.2491928

    Article  Google Scholar 

  9. Kong Y, Jia Y, Fu Y (2012) Learning human interaction by interactive phrases. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-642-33718-5_22, vol 7572 LNCS, pp 300–313

  10. Kong Y, Kit D, Fu Y (2014) A discriminative model with multiple temporal scales for action prediction. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-10602-1_39, vol 8693 LNCS, pp 596–611

  11. Kong Y, Tao Z, Fu Y (2017) Deep sequential context networks for action prediction. In: 2017 IEEE Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2017.390, http://ieeexplore.ieee.org/document/8099873/, pp 3662–3670

  12. Lai S, Zheng W S, Hu J F, Zhang J (2017) Global-local temporal saliency action prediction. IEEE Trans Image Process 27(5):2272–2285. https://doi.org/10.1109/TIP.2017.2751145

    Article  MathSciNet  MATH  Google Scholar 

  13. Laptev I (2005) On space-time interest points. In: International journal of computer vision. https://doi.org/10.1007/s11263-005-1838-7, vol 64, pp 107–123

  14. Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657. https://doi.org/10.1109/TPAMI.2013.2297321

    Article  Google Scholar 

  15. Liu J, Shahroudy A, Wang G, Duan L-Y, Kot AC (2018) Ssnet: scale selection network for online 3d action prediction. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8349–8358

  16. Liu J, Shahroudy A, Wang G, Duan L-Y, Kot Chichung A (2019) Skeleton-based online action prediction using scale selection network. IEEE Trans Pattern Anal Mach Intell, 1–1. https://doi.org/10.1109/tpami.2019.2898954

  17. Ma S, Sigal L, Sclaroff S (2016) Learning activity progression in LSTMs for activity detection and early detection. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/CVPR.2016.214, http://ieeexplore.ieee.org/document/7780583/, pp 1942–1950

  18. Manning C, Raghavan P, Schütze H (2010) Introduction to information retrieval. Cambridge University Press

  19. Rana AJ, Tirupattur P, Duarte K, Demir U, Rawat Y, Shah M (2020) An online system for real-time activity detection in untrimmed surveillance videos Mamshad Nayeem Rizve. Appl Sci 10(1)

  20. Rasouli A, Kotseruba I, Tsotsos JK (2019) Pedestrian action anticipation using contextual feature fusion in stacked RNNs. In: Proceedings of the 30th British Machine Vision Conference 2019, BMVC 2019

  21. Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24(5):971–981. https://doi.org/10.1007/s00138-012-0450-4

    Article  Google Scholar 

  22. Rodriguez MD, Ahmed J, Shah M (2008) Action MACH: a spatio-temporal maximum average correlation height filter for action recognition. In: 26th IEEE Conference on computer vision and pattern recognition, CVPR. https://doi.org/10.1109/CVPR.2008.4587727

  23. Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1109/ICCV.2011.6126349, pp 1036–1043

  24. Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings - international conference on pattern recognition. https://doi.org/10.1109/ICPR.2004.1334462, vol 3, pp 32–36

  25. Scovanner P, Ali S, Shah M (2007) A 3-dimensional sift descriptor and its application to action recognition. In: Proceedings of the 15th international conference on Multimedia - MULTIMEDIA ’07. https://doi.org/10.1145/1291233.1291311, http://portal.acm.org/citation.cfm?doid=1291233.1291311, p 357

  26. Sharif M, Khan MA, Zahid F, Shah JH, Akram T (2020) Human action recognition: a framework of statistical weighted segmentation and rank correlation-based selection. Pattern Anal Applic 23(1):281–294. https://doi.org/10.1007/s10044-019-00789-0

    Article  Google Scholar 

  27. Soomro K, Zamir AR (2014) Action recognition in realistic sports videos. Adv Comput Vis Pattern Recogn 71:181–208. https://doi.org/10.1007/978-3-319-09396-3_9

    Article  Google Scholar 

  28. Tran DP, Nhu NG, Hoang VD (2018) Pedestrian action prediction based on deep features extraction of human posture and traffic scene. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics). https://doi.org/10.1007/978-3-319-75420-8_53, https://link.springer.com/chapter/10.1007/978-3-319-75420-8_53, vol 10752 LNAI. Springer, pp 563–572

  29. Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 98–106

  30. Wang H, Yang W, Yuan C, Ling H, Hu W (2017) Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225:139–147. https://doi.org/10.1016/j.neucom.2016.11.004

    Article  Google Scholar 

  31. Wang H, Kläser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition. https://doi.org/10.1007/s11263-012-0594-8, pp 3169–3176

  32. Wang H, Kläser A, Schmid C, Liu CL (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vis 103(1):60–79. https://doi.org/10.1007/s11263-012-0594-8

    Article  MathSciNet  Google Scholar 

  33. Wang H, Oneata D, Verbeek J, Schmid C (2016) A robust and efficient video representation for action recognition. Int J Comput Vis 119(3):219–238. https://doi.org/10.1007/s11263-015-0846-5

    Article  MathSciNet  Google Scholar 

  34. Wang H, Schmid C (2013) Action recognition with improved trajectories. In: Proceedings of the IEEE international conference on computer vision. https://doi.org/10.1016/j.neucom.2016.11.004, pp 3551–3558

  35. Wang X, Hu J-F, Lai J-H, Zhang J, Zheng W-S (2019) Progressive teacher-student learning for early action prediction. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr.2019.00367. Institute of Electrical and Electronics Engineers (IEEE), pp 3551–3560

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzin Yaghmaee.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saremi, M., Yaghmaee, F. Improved use of descriptors for early recognition of actions in video. Multimed Tools Appl 82, 2617–2633 (2023). https://doi.org/10.1007/s11042-022-13316-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13316-x

Keywords

Navigation