Sparse-Temporal Segment Network for Action Recognition

Li, Chaobo; Ding, Yupeng; Li, Hongjun

doi:10.1007/978-3-030-36189-1_7

Chaobo Li¹³,
Yupeng Ding¹³ &
Hongjun Li ORCID: orcid.org/0000-0001-7500-4979^13,14,15,16

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11935))

Included in the following conference series:

International Conference on Intelligent Science and Big Data Engineering

1491 Accesses

Abstract

The most typical methods of human action recognition in videos rely on features extracted by deep neural network. Inspired by the temporal segment network, the sparse-temporal segment network to recognize human actions is proposed. Considering the sparse features contains the information of moving objects in videos, for example marginal information which is helpful to capture the target region and reduce the interference from similar actions, the robust principal component analysis algorithm was used to extract sparse features coping with background motion, illumination changes, noise and poor image quality. Based on different characteristics of three modal data, three parallel networks including RGB frame-network, optical flow-network and sparse feature-network were constructed and then fused through diverse ways. Comparative evaluations on the UCF101 demonstrate that three modal data contain the complementary features. Extensive experiments in subjective and objective show that temporal-sparse segment network can reach the accuracy of 94.2%, which is significantly better than several state-of-the-art algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Sparse Dense Transformer Network for Video Action Recognition

SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition

Article 24 July 2023

Action Recognition in Videos with Temporal Segments Fusions

References

Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2016)
Article Google Scholar
Wu, D., Sharma, N., Blumenstein, M.: Recent advances in video-based human action recognition using deep learning: a review. In: IEEE International Joint Conference on Neural Networks, Anchorage, USA, pp. 2865–2872. IEEE (2017)
Google Scholar
Ramezani, M., Yaghmaee, F.: Motion pattern based representation for improving human action retrieval. Multimedia Tools Appl. 77(19), 26009–26032 (2018)
Article Google Scholar
Chakraborty, B.K., Sarma, D., Bhuyan, M.K., et al.: Review of constraints on vision-based gesture recognition for human-computer interaction. IET Comput. Vis. 12(1), 3–15 (2018)
Article Google Scholar
Pushparaj, S., Arumugam, S.: Using 3D convolutional neural network in surveillance videos for recognizing human actions. Int. Arab. J. Inf. Technol. 15(4), 693–700 (2019)
Google Scholar
Fangbemi, A.S., Liu, B., Yu, N.H., Zhang, Y.: Efficient human action recognition interface for augmented and virtual reality applications based on binary descriptor. In: De Paolis, L.T., Bourdot, P. (eds.) AVR 2018. LNCS, vol. 10850, pp. 252–260. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-95270-3_21
Chapter Google Scholar
Wang, P., Liu, H., Wang, L., et al.: Deep learning-based human motion recognition for predictive context-aware human-robot collaboration. CIRP Ann. Manuf. Technol. 67(1), 17–20 (2018)
Article Google Scholar
Li, H.J., Suen, C.Y.: A novel Non-local means image denoising method based on grey theory. Pattern Recogn. 49(1), 217–248 (2016)
Article Google Scholar
Cao, C., Zhang, Y., Zhang, C., et al.: Body joint guided 3D deep convolutional descriptors for action recognition. IEEE Trans. Cybern. 48(3), 1095–1108 (2018)
Article Google Scholar
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., et al.: Beyond short snippets: deep networks for video classification. In: IEEE Conference on Computer Vision and Pattern Recognition, Boston, USA, pp. 4694–4702. IEEE (2015)
Google Scholar
Ding, Y., Li H.J., Li, Z.Y.: Human motion recognition based on packet convolution neural network. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering, Nanjing, China, pp. 1–5. IEEE (2017)
Google Scholar
Ji, S., Xu, W., Yang, M., et al.: 3D convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(1), 221–231 (2013)
Article Google Scholar
Tran, D., Bourdev, L., Fergus, R., et al.: Learning spatiotemporal features with 3D convolutional networks. In: International Conference on Computer Vision, Santiago, Chile, pp. 4489–4497. IEEE (2014)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. Neural Inf. Process. Syst. 1(4), 568–576 (2014)
Google Scholar
Zhu, W., Hu, J., Sun, G., et al.: A key volume mining deep framework for action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1991–1999. IEEE (2016)
Google Scholar
Zhu, Y., Lan, Z., Newsam, S., et al.: Hidden two-stream convolutional networks for action recognition. arXiv preprint arXiv:1704.00389 (2017)
Zhang, B., Wang, L., Wang, Z., et al.: Real-time action recognition with deeply-transferred motion vector CNNs. IEEE Trans. Image Process. 27(5), 2326–2339 (2018)
Article MathSciNet Google Scholar
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp. 1933–1941. IEEE (2016)
Google Scholar
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Lan, Z., Zhu, Y., Hauptmann, A.G., et al.: Deep local video feature for action recognition. In: International Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, USA, pp. 1219–1225. IEEE (2017)
Google Scholar
Zhou, B., Andonian, A., Torralba, A.: Temporal relational reasoning in videos. arXiv preprint arXiv:1711.08496v1 (2018)
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: International Conference on Computer Vision, Sydney, Australia, pp. 3551–3558. IEEE (2014)
Google Scholar
Li, H.J., Suen, C.Y.: Robust face recognition based on dynamic rank representation. Pattern Recogn. 60(12), 13–24 (2016)
Article Google Scholar
Li, H.J., Hu, W., Li, C.B., et al.: Review on grey relation applied in image sparse representation. J. Grey Syst. 31(1), 52–65 (2019)
Google Scholar
Soomro, K., Zamir, A.R., Shah, M.: UCF101: a dataset of 101 human action classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)

Download references

Acknowledgment

This work is supported by National Natural Science Foundation of China (NO. 61871241); Ministry of education cooperation in production and education (NO. 201802302115); Educational Science Research Subject of China Transportation Education Research Association (Jiaotong Education Research 1802-118); the Science and Technology Program of Nantong (JC2018025, JC2018129); Nantong University-Nantong Joint Research Center for Intelligent Information Technology (KFKT2017B04); Nanjing University State Key Lab. for Novel Software Technology (KFKT2019B15); Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX19_2056).

Author information

Authors and Affiliations

School of Information Science and Technology, Nantong University, Nantong, 226019, China
Chaobo Li, Yupeng Ding & Hongjun Li
State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China
Hongjun Li
Nantong Research Institute for Advanced Communication Technologies, Nantong, 226019, China
Hongjun Li
TONGKE School of Microelectronics, Nantong, 226019, China
Hongjun Li

Authors

Chaobo Li
View author publications
You can also search for this author in PubMed Google Scholar
Yupeng Ding
View author publications
You can also search for this author in PubMed Google Scholar
Hongjun Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjun Li .

Editor information

Editors and Affiliations

Nanjing University of Science and Technology, Nanjing, China
Zhen Cui
Nanjing University of Science and Technology, Nanjing, China
Jinshan Pan
Nanjing University of Science and Technology, Nanjing, China
Shanshan Zhang
Nanjing University of Science and Technology, Nanjing, China
Liang Xiao
Nanjing University of Science and Technology, Nanjing, China
Jian Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, C., Ding, Y., Li, H. (2019). Sparse-Temporal Segment Network for Action Recognition. In: Cui, Z., Pan, J., Zhang, S., Xiao, L., Yang, J. (eds) Intelligence Science and Big Data Engineering. Visual Data Engineering. IScIDE 2019. Lecture Notes in Computer Science(), vol 11935. Springer, Cham. https://doi.org/10.1007/978-3-030-36189-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-030-36189-1_7
Published: 29 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36188-4
Online ISBN: 978-3-030-36189-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Sparse-Temporal Segment Network for Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Dense Transformer Network for Video Action Recognition

SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition

Action Recognition in Videos with Temporal Segments Fusions

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Sparse-Temporal Segment Network for Action Recognition

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Sparse Dense Transformer Network for Video Action Recognition

SiamMAST: Siamese motion-aware spatio-temporal network for video action recognition

Action Recognition in Videos with Temporal Segments Fusions

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation