Probabilistic selection of frames for early action recognition in videos

Saremi, Mehrin; Yaghmaee, Farzin

doi:10.1007/s13735-019-00182-x

Probabilistic selection of frames for early action recognition in videos

Short Paper
Published: 28 October 2019

Volume 8, pages 253–257, (2019)
Cite this article

International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Mehrin Saremi¹ &
Farzin Yaghmaee²

186 Accesses
Explore all metrics

Abstract

Early action recognition seeks to recognize human actions in a video, while the video has been only partially observed. In this paper, we introduce an approach to this kind of recognition task. In some offline (non-early) recognition works, it has been proposed to sample frames of the video uniformly and use them in training of the model. However, there is no reason that uniform sampling should be optimal, so we propose a non-uniform sampling to make it more tailored to early recognition. The proposed method samples the frames in such a way that earlier frames are more likely to be chosen. These frames are then used in training a deep network architecture. We compare our sampling approach with a uniform sampling process, using HMDB51 dataset as a benchmark. We further compare our method with other state-of-the-art early recognition works. The experimental results suggest that our sampling process leads to better recognition accuracy than uniform sampling, at the early stages of the video, and that our proposed algorithm outperforms the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Cao Y, Barrett D, Barbu A, Narayanaswamy S, Yu H, Michaux A, Lin Y, Dickinson S, Siskind JM, Wang S (2013) Recognize human activities from partially observed videos. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, pp 2658–2665. https://doi.org/10.1109/CVPR.2013.343
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778.
Hu JF, Zheng WS, Ma L, Wang G, Lai JH, Zhang J (2018) Early action prediction by soft regression. IEEE Trans Pattern Anal Mach Intell 41(11):2568–2583. https://doi.org/10.1109/TPAMI.2018.2863279
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: 32nd international conference on machine learning, ICML 2015, vol 1. International Machine Learning Society (IMLS), pp 448–456
Kong Y, Fu Y (2016) Max-margin action prediction machine. IEEE Trans Pattern Anal Mach Intell 38(9):1844–1858. https://doi.org/10.1109/TPAMI.2015.2491928
Article Google Scholar
Kong Y, Kit D, Fu Y (2014) A discriminative model with multiple temporal scales for action prediction. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 8693. LNCS, pp 596–611. https://doi.org/10.1007/978-3-319-10602-1_39
Chapter Google Scholar
Kong Y, Tao Z, Fu Y (2017) Deep sequential context networks for action prediction. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR), pp 3662–3670. https://doi.org/10.1109/CVPR.2017.390. http://ieeexplore.ieee.org/document/8099873/
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of the IEEE international conference on computer vision, pp 2556–2563. https://doi.org/10.1109/ICCV.2011.6126543
Lai S, Zheng WS, Hu JF, Zhang J (2017) Global-local temporal saliency action prediction. IEEE Trans Image Process. https://doi.org/10.1109/TIP.2017.2751145
Article MATH Google Scholar
Li K, Fu Y (2014) Prediction of human activity by discovering temporal sequence patterns. IEEE Trans Pattern Anal Mach Intell 36(8):1644–1657. https://doi.org/10.1109/TPAMI.2013.2297321
Article Google Scholar
Ryoo MS (2011) Human activity prediction: early recognition of ongoing activities from streaming videos. In: Proceedings of the IEEE international conference on computer vision, pp 1036–1043. https://doi.org/10.1109/ICCV.2011.6126349
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. Adv Neural Inf Process Syst 1:568–576
Google Scholar
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 98–106
Wang H, Yang W, Yuan C, Ling H, Hu W (2017) Human activity prediction using temporally-weighted generalized time warping. Neurocomputing 225:139–147. https://doi.org/10.1016/j.neucom.2016.11.004
Article Google Scholar
Wang H, Yuan C, Shen J, Yang W, Ling H (2018) Action unit detection and key frame selection for human activity prediction. Neurocomputing 318:109–119. https://doi.org/10.1016/j.neucom.2018.08.037
Article Google Scholar
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, van Gool L (2016) Temporal segment networks: towards good practices for deep action recognition. In: Lecture notes in computer science (including subseries lecture notes in artificial intelligence and lecture notes in bioinformatics), vol 9912. LNCS, pp 20–36. https://doi.org/10.1007/978-3-319-46484-8_2. arXiv:1608.00859
Chapter Google Scholar
Xu Z, Qing L, Miao J (2015) Activity auto-completion: Predicting human activities from partial videos. In: Proceedings of the IEEE international conference on computer vision, pp 3191–3199. https://doi.org/10.1109/ICCV.2015.365
Zanfir M, Leordeanu M, Sminchisescu C (2013) The moving pose: an efficient 3D kinematics descriptor for low-latency action recognition and detection. In: Proceedings of the IEEE international conference on computer vision, pp 2752–2759. https://doi.org/10.1109/ICCV.2013.342

Download references

Author information

Authors and Affiliations

Electrical and Computer Engineering, Semnan University, Semnan, 19111-35131, Iran
Mehrin Saremi
Faculty of Electrical and Computer Engineering, Semnan University, Semnan, 19111-35131, Iran
Farzin Yaghmaee

Authors

Mehrin Saremi
View author publications
You can also search for this author in PubMed Google Scholar
Farzin Yaghmaee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Farzin Yaghmaee.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Saremi, M., Yaghmaee, F. Probabilistic selection of frames for early action recognition in videos. Int J Multimed Info Retr 8, 253–257 (2019). https://doi.org/10.1007/s13735-019-00182-x

Download citation

Received: 02 May 2019
Revised: 12 October 2019
Accepted: 15 October 2019
Published: 28 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s13735-019-00182-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Probabilistic selection of frames for early action recognition in videos

Abstract

Access this article

Subscribe and save

Buy Now

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now