More efficient and effective tricks for deep action recognition

Liu, Zheyuan; Zhang, Xiaoteng; Song, Lei; Ding, Zhengyan; Duan, Huixian

doi:10.1007/s10586-017-1309-2

More efficient and effective tricks for deep action recognition

Published: 07 November 2017

Volume 22, pages 819–826, (2019)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Zheyuan Liu¹,
Xiaoteng Zhang¹,
Lei Song¹,
Zhengyan Ding¹ &
…
Huixian Duan¹

303 Accesses
4 Citations
Explore all metrics

Abstract

Deep convolutional network has achieved great success in visual recognition of static images, while it is not so advantageous as traditional methods in action recognition in videos. As two-stream-style convolutional network gaining best performance in human action recognition, there exist obstacles such as selecting different pre-train models and hyper-parameters, and high computation consumption. In this paper, we propose two efficient and effective methods for action recognition, based on two-stream convolutional network. (1) Reducing computational cost of temporal stream while achieving the same accuracy, and (2) providing techniques such as selection of optical flow algorithm, the pre-train dataset/architectures and the hyper-parameters for assembly in action recognition task. Experimental results show that we are able to obtain performance on a par with the state-of-the-art ones on the datasets of HMDB51 (70.9%) and UCF101 (95.4%).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

Human Action Recognition Based on Deep Learning

Action recognition based on multi-stage jointly training convolutional network

Article 31 August 2018

References

Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: ICLR, pp. 1–14 (2015)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: CVPR, pp. 1–9 (2015)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Tran, D., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: ICCV, pp. 4489–4497 (2015)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In NIPS (2014)
Wang, L., Qiao, Y., Tang, X.: Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR (2015)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: CVPR (2016)
Wang, L., Xiong, Y., Wang, Z., Qiao, Y., Lin, D., Tang, X., Gool, L.V.: Temporal segment networks: towards good practices for deep action recognition. In: ECCV (2016)
Soomro, K., Zamir, A.R., Shah, M.: UCF101: A dataset of 101 human actions classes from videos in the wild. In: CoRR (2012)
Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: Hmdb: a large video database for human motion recognition. In: ICCV (2011)
Ioe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp. 448–456 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Yue-Hei, J., Jonghyun, C., Jan, N., Larry, D.: ActionFlowNet: learning motion representation for action recognition. In: CoRR (2016). arXiv:1612.03052
Brox, T., Bruhn, A., Papenberg, N., Weickert, J.: High accuracy optical flow estimation based on a theory for warping. In: ECCV (2004)
Jean-Yves, B.: Pyramidal Implementation of the Affine Lucas Kanade Feature Tracker Description of the Algorithm, vol. 5, no. 1–10, p. 4. Intel Corporation (2001)
Zach, C., Pock, T., Bischof, H.: A duality based approach for realtime tv-L1 optical flow. In: Proceedings of the 29th DAGM symposium on pattern recognition, pp. 214–223 (2007)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning deep features for scene recognition using places database 27. In: NIPS (2014)
Jiang, Y., Wu, Z., Wang, J., Xue, X., Chang, S.: Exploiting feature and class relationships in video categorization with regularized deep neural networks. In: CoRR (2015). arXiv:1502.07209
Szegedy, C., Vanhoucke, V., Ioffe, S., Jonathon, S.: Rethinking the inception architecture for computer vision. In: CoRR (2015). arXiv:1512.00567
Szegedy, C., Ioffe, S., Vanhoucke, V.: Inception-v4, inception-resnet and the impact of residual connections on learning. In: CoRR (2016). arXiv:1602.07261
Xie, S., Girshick, R., Doll’ar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CoRR (2016). arXiv:1611.05431
Lan, Z., Lin, M., Li, X., Hauptmann, A.G., Raj, B.: Beyond Gaussian pyramid: multi-skip feature stacking for action recognition. In: CVPR (2015)
Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y.: A key volume mining deep framework for action recognition. In: CVPR, pp. 1991–1999 (2016)
Diba, A., Sharma, V., Van Gool, L.: Deep temporal linear encoding networks (2016). arXiv Preprint arXiv:1611.06678
Zhenzhong, L., Yi, Z., Alexander, G.: Deep local video feature for action recognition. In: CoRR (2017). arXiv:1701.07368
Jiang, Y.G., Liu, J., Roshan Zamir, A., Laptev, I., Piccardi, M., Shah, M., Suk-thankar, R.: THUMOS challenge: action recognition with a large number of classes (2013)
Lucas, B., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: Proceedings of the 7th international joint conference on artificial intelligence (IJCAI), pp. 674–679 (1981)

Download references

Acknowledgements

The authors of this paper are members of Shanghai Engineering Research Center of Intelligent Video Surveillance. Dr. Lei Song is also a visiting researcher with Shenzhen Key Laboratory of Media Security, Shenzhen University, Shenzhen 518060, China. Our research was sponsored by following projects: the National Natural Science Foundation of China (61402116, 61403084); Program of Science and Technology Commission of Shanghai Municipality (Nos. 15530701300, 15XD15202000); 2012 IoT Program of Ministry of Industry and Information Technology of China; Key Project of the Ministry of Public Security (No. 2014JSYJA007); the Project of the Key Laboratory of Embedded System and Service Computing, Ministry of Education, Tongji University (ESSCKF 2015-03); Shanghai Rising-Star Program (17QB1401000); the Special Fund for Basic R&D Expenses of Central Level Public Welfare Scientific Research Institutions (C17384).

Author information

Authors and Affiliations

The Third Research Institute of the Ministry of Public Security, Shanghai, 201204, China
Zheyuan Liu, Xiaoteng Zhang, Lei Song, Zhengyan Ding & Huixian Duan

Authors

Zheyuan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoteng Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Lei Song
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyan Ding
View author publications
You can also search for this author in PubMed Google Scholar
Huixian Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lei Song.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, Z., Zhang, X., Song, L. et al. More efficient and effective tricks for deep action recognition. Cluster Comput 22 (Suppl 1), 819–826 (2019). https://doi.org/10.1007/s10586-017-1309-2

Download citation

Received: 31 July 2017
Revised: 15 September 2017
Accepted: 28 October 2017
Published: 07 November 2017
Issue Date: 16 January 2019
DOI: https://doi.org/10.1007/s10586-017-1309-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

More efficient and effective tricks for deep action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

Human Action Recognition Based on Deep Learning

Action recognition based on multi-stage jointly training convolutional network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

More efficient and effective tricks for deep action recognition

Abstract

Access this article

Similar content being viewed by others

Multi-stream with Deep Convolutional Neural Networks for Human Action Recognition in Videos

Human Action Recognition Based on Deep Learning

Action recognition based on multi-stage jointly training convolutional network

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation