Action recognition on continuous video

Chang, Y. L.; Chan, C. S.; Remagnino, P.

doi:10.1007/s00521-020-04982-9

Action recognition on continuous video

Original Article
Published: 05 June 2020

Volume 33, pages 1233–1243, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

851 Accesses
Explore all metrics

Abstract

Video action recognition has been a challenging task over the years. The challenge herein is not only due to the complication in increasing information in videos but also the requirement of an efficient method to retain information over a longer-term where human action would take to perform. This paper proposes a novel framework, named as long-term video action recognition (LVAR) to perform generic action classification in the continuous video. The idea of LVAR is introducing a partial recurrence connection to propagate information within every layer of a spatial-temporal network, such as the well-known C3D. Empirically, we show that this addition allows the C3D network to access long-term information, and subsequently improves action recognition performance with videos of different length selected from both UCF101 and miniKinetics datasets. Further confirmation of our approach is strengthened with experiments on untrimmed video from the Thumos14 dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

ESTI: an action recognition network with enhanced spatio-temporal information

Article 22 March 2023

MaCLR: Motion-Aware Contrastive Learning of Representations for Videos

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Artificial Intelligence

Notes

Segmenting the last section from the temporal axis can ensure the contextual information transferred is relevant to the current event.

References

Carreira J, Zisserman A (2017) Quo vadis, action recognition? A new model and the kinetics dataset. In: CVPR
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: CVPR, pp 1933–1941
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV, pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jiang X, Sun J, Li C, Ding H (2018) Video image defogging recognition based on recurrent neural network. TII 14(7):3281–3288
Google Scholar
Jiang Y-G, Liu J, Zamir AR, Toderici G, Laptev I, Shah M, Sukthankar R (2014) THUMOS challenge: action recognition with a large number of classes. Retrieved from https://www.crcv.ucf.edu/THUMOS14/results.html
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Maaten Lvd, Hinton G (2008) Visualizing data using t-SNE. JMLR 9(Nov):2579–2605
MATH Google Scholar
Muhammad K, Hamza R, Ahmad J, Lloret J, Wang H, Baik SW (2018) Secure surveillance framework for iot systems using probabilistic image encryption. TII 14(8):3679–3689
Google Scholar
Ng JYH, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In: CVPR, pp 4694–4702
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252
Article MathSciNet Google Scholar
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild. In: CRCV-TR-12-01
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, vol 4, p 12
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp 4489–4497
Tran D, Wang H, Torresani L, Ray J, LeCun Y, Paluri M (2018) A closer look at spatiotemporal convolutions for action recognition. In: CVPR, pp 6450–6459
Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. TPAMI 40(6):1510–1517
Article Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. In: ICCV, pp 3551–3558
Wang L, Xiong Y, Wang Z, Qiao Y (2015) Towards good practices for very deep two-stream convnets. arXiv:1507.02159
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV. Springer, pp 20–36
Wang P, Cao Y, Shen C, Liu L, Shen HT (2017) Temporal pyramid pooling-based convolutional neural network for action recognition. TCSVT 27(12):2613–2622
Google Scholar
Wu CY, Feichtenhofer C, Fan H, He K, Krähenbühl P, Girshick R (2018) Long-term feature banks for detailed video understanding. arXiv:1812.05038
Zeng Z, Li Z, Cheng D, Zhang H, Zhan K, Yang Y (2018) Two-stream multirate recurrent neural network for video-based pedestrian reidentification. TII 14(7):3179–3186
Google Scholar

Download references

Acknowledgements

This research is supported by the Fundamental Research Grant Scheme (FRGS) MoHE Grant FP021-2018A, from the Ministry of Education Malaysia, and Postgraduate Research Grant (PPP) Grant PG006-2016A, from University of Malaya, Malaysia. We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.

Author information

Authors and Affiliations

University of Malaya, Kuala Lumpur, Malaysia
Y. L. Chang & C. S. Chan
Kingston upon Thames, London, UK
P. Remagnino

Authors

Y. L. Chang
View author publications
You can also search for this author inPubMed Google Scholar
C. S. Chan
View author publications
You can also search for this author inPubMed Google Scholar
P. Remagnino
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to C. S. Chan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chang, Y.L., Chan, C.S. & Remagnino, P. Action recognition on continuous video. Neural Comput & Applic 33, 1233–1243 (2021). https://doi.org/10.1007/s00521-020-04982-9

Download citation

Received: 24 August 2019
Accepted: 02 May 2020
Published: 05 June 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s00521-020-04982-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action recognition on continuous video

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

ESTI: an action recognition network with enhanced spatio-temporal information

MaCLR: Motion-Aware Contrastive Learning of Representations for Videos

Explore related subjects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now