Skip to main content

Advertisement

Log in

Probabilistic selection of frames for early action recognition in videos

  • Short Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Early action recognition seeks to recognize human actions in a video, while the video has been only partially observed. In this paper, we introduce an approach to this kind of recognition task. In some offline (non-early) recognition works, it has been proposed to sample frames of the video uniformly and use them in training of the model. However, there is no reason that uniform sampling should be optimal, so we propose a non-uniform sampling to make it more tailored to early recognition. The proposed method samples the frames in such a way that earlier frames are more likely to be chosen. These frames are then used in training a deep network architecture. We compare our sampling approach with a uniform sampling process, using HMDB51 dataset as a benchmark. We further compare our method with other state-of-the-art early recognition works. The experimental results suggest that our sampling process leads to better recognition accuracy than uniform sampling, at the early stages of the video, and that our proposed algorithm outperforms the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

References

  1. Cao Y, Barrett D, Barbu A, Narayanaswamy S, Yu H, Michaux A, Lin Y, Dickinson S, Siskind JM, Wang S (2013) Recognize human activities from partially observed videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2658–2665. https://doi.org/10.1109/CVPR.2013.343

  2. Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255

  3. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778.

  4. Hu JF, Zheng WS, Ma L, Wang G, Lai JH, Zhang J (2018) Early action prediction by soft regression. IEEE Trans Pattern Anal Mach Intell 41(11):2568–2583. https://doi.org/10.1109/TPAMI.2018.2863279

    Article  Google Scholar 

  5. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd international conference on machine learning, ICML 2015, vol 1. International Machine Learning Society (IMLS), pp 448–456

  6. Kong Y, Fu Y (2016) Max-margin action prediction machine. IEEE Trans Pattern Anal Mach Intell 38(9):1844–1858. https://doi.org/10.1109/TPAMI.2015.2491928

    Article  Google Scholar 

  7. Kong Y, Kit D, Fu Y (2014) A discriminative model with multiple temporal scales for action prediction. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8693. LNCS, pp 596–611. https://doi.org/10.1007/978-3-319-10602-1_39

    Chapter  Google Scholar 

  8. Kong Y, Tao Z, Fu Y (2017) Deep sequential context networks for action prediction. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3662–3670. https://doi.org/10.1109/CVPR.2017.390. http://ieeexplore.ieee.org/document/8099873/

  9. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2556–2563. https://doi.org/10.1109/ICCV.2011.6126543

  10. Lai S, Zheng WS, Hu JF, Zhang J (2017) Global-local temporal saliency action prediction. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2017.2751145

    Article  MATH  Google Scholar 

  11. Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657. https://doi.org/10.1109/TPAMI.2013.2297321

    Article  Google Scholar 

  12. Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of the IEEE international conference on computer vision, pp 1036–1043. https://doi.org/10.1109/ICCV.2011.6126349

  13. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1:568–576

    Google Scholar 

  14. Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 98–106

  15. Wang H, Yang W, Yuan C, Ling H, Hu W (2017) Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225:139–147. https://doi.org/10.1016/j.neucom.2016.11.004

    Article  Google Scholar 

  16. Wang H, Yuan C, Shen J, Yang W, Ling H (2018) Action unit detection and key frame selection for human activity prediction. Neurocomputing 318:109–119. https://doi.org/10.1016/j.neucom.2018.08.037

    Article  Google Scholar 

  17. Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9912. LNCS, pp 20–36. https://doi.org/10.1007/978-3-319-46484-8_2. arXiv:1608.00859

    Chapter  Google Scholar 

  18. Xu Z, Qing L, Miao J (2015) Activity auto-completion: Predicting human activities from partial videos. In: Proceedings of the IEEE international conference on computer vision, pp 3191–3199. https://doi.org/10.1109/ICCV.2015.365

  19. Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: Proceedings of the IEEE international conference on computer vision, pp 2752–2759. https://doi.org/10.1109/ICCV.2013.342

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Farzin Yaghmaee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Saremi, M., Yaghmaee, F. Probabilistic selection of frames for early action recognition in videos. Int J Multimed Info Retr 8, 253–257 (2019). https://doi.org/10.1007/s13735-019-00182-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-019-00182-x

Keywords

Navigation