Recognizing Human Activities in Videos Using Improved Dense Trajectories over LSTM

Singh, Krit Karan; Mukherjee, Snehasis

doi:10.1007/978-981-13-0020-2_8

Krit Karan Singh¹² &
Snehasis Mukherjee¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 841))

Included in the following conference series:

National Conference on Computer Vision, Pattern Recognition, Image Processing, and Graphics

1489 Accesses
1 Citations

Abstract

We propose a deep learning based technique to classify actions based on Long Short Term Memory (LSTM) networks. The proposed scheme first learns spatial temporal features from the video, using an extension of the Convolutional Neural Networks (CNN) to 3D. A Recurrent Neural Network (RNN) is then trained to classify each sequence considering the temporal evolution of the learned features for each time step. Experimental results on the CMU MoCap, UCF 101, Hollywood 2 dataset show the efficacy of the proposed approach. We extend the proposed framework with an efficient motion feature, to enable handling significant camera motion. The proposed approach outperforms the existing deep models for each dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

DT-3DResNet-LSTM: An Architecture for Temporal Activity Recognition in Videos

Time-varying LSTM networks for action recognition

Article 18 June 2018

Real-time human action recognition using raw depth video-based recurrent neural networks

Article Open access 28 October 2022

References

Ziaeefar, M., Bergevin, R.: Semantic human activity recognition: a literature review. Pattern Recognit. 48(8), 2329–2345 (2015)
Article Google Scholar
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR, pp. 1–8 (2008)
Google Scholar
Chen, M.Y., Hauptmann, A.: MoSIFT: recognizing human actions in surveillance videos. Technical report CMU-CS-09-161. Carnegie Mellon University (2009)
Google Scholar
Dollar, P., Rabaud, V., Cottrell, G., Belongie, S.: Behavior recognition via sparse spatio-temporal features. In: Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance, pp. 65–72 (2005)
Google Scholar
Singh, S., Arora, C., Jawahar, C.V.: First person action recognition using deep learned descriptors. In: CVPR 2016 (2016)
Google Scholar
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing human action at a distance in video by key poses. IEEE Trans. Circ. Syst. Video Technol. 21(9), 1228–1241 (2011)
Article Google Scholar
Mukherjee, S., Biswas, S.K., Mukherjee, D.P.: Recognizing interactions between human performers by ‘Dominating Pose Doublet’. Mach. Vis. Appl. 25(4), 1033–1052 (2014)
Article Google Scholar
Wang, H., Klaser, A., Schmid, C., Liu, C.-L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Mukherjee, S.: Human action recognition using dominant pose duplet. In: Nalpantidis, L., Krüger, V., Eklundh, J.-O., Gasteratos, A. (eds.) ICVS 2015. LNCS, vol. 9163, pp. 488–497. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-20904-3_44
Chapter Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV, pp. 3551–3558 (2013)
Google Scholar
Buddubariki, V., Tulluri, S.G., Mukherjee, S.: Event recognition in egocentric videos using a novel trajectory based feature. In: ICVGIP, pp. 76:1–76:8. ACM (2016)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01, November 2012
Google Scholar
Taylor, G.W., Fergus, R., LeCun, Y., Bregler, C.: Convolutional learning of spatio-temporal features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 140–153. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15567-3_11
Chapter Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE T-PAMI 35(1), 221–231 (2013)
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV (2015)
Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Google Scholar
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep convolutional descriptors. In: CVPR (2015)
Google Scholar
Donahue, J., Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)
Google Scholar
CMU MoCap dataset. http://mocap.cs.cmu.edu/. Accessed Dec 2016
Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: CVPR (2009)
Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Mukherjee, S., Singh, K.K.: Human action and event recognition using a novel descriptor based on improved dense trajectories. Multimed. Tools Appl. (2017). https://doi.org/10.1007/s11042-017-4980-7
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors wish to acknowledge the generous financial support provided by the Science and Engineering Research Board (SERB) of the Department of Science and Technology (DST), the Government of India, for conducting this research work. The financial support was provided through the project numbered ECR/2016/000652.

Author information

Authors and Affiliations

Indian Institute of Information Technology SriCity, Sri City, India
Krit Karan Singh & Snehasis Mukherjee

Authors

Krit Karan Singh
View author publications
You can also search for this author in PubMed Google Scholar
Snehasis Mukherjee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Snehasis Mukherjee .

Editor information

Editors and Affiliations

Indian Institute of Technology Mandi, Mandi, Himachal Pradesh, India
Renu Rameshan
Indraprastha Institute of Information Technology, New Delhi, India
Chetan Arora
Indian Institute of Technology, New Delhi, India
Sumantra Dutta Roy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, K.K., Mukherjee, S. (2018). Recognizing Human Activities in Videos Using Improved Dense Trajectories over LSTM. In: Rameshan, R., Arora, C., Dutta Roy, S. (eds) Computer Vision, Pattern Recognition, Image Processing, and Graphics. NCVPRIPG 2017. Communications in Computer and Information Science, vol 841. Springer, Singapore. https://doi.org/10.1007/978-981-13-0020-2_8

Download citation

DOI: https://doi.org/10.1007/978-981-13-0020-2_8
Published: 26 April 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0019-6
Online ISBN: 978-981-13-0020-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics