Overview of behavior recognition based on deep learning

Hu, Kai; Jin, Junlan; Zheng, Fei; Weng, Liguo; Ding, Yiwu

doi:10.1007/s10462-022-10210-8

Overview of behavior recognition based on deep learning

Published: 21 June 2022

Volume 56, pages 1833–1865, (2023)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

Kai Hu ORCID: orcid.org/0000-0001-7181-9935^1,2,
Junlan Jin^1,2^na1,
Fei Zheng^2,3^na1,
Liguo Weng^1,2^na1 &
…
Yiwu Ding^1,2^na1

2785 Accesses
4 Altmetric
Explore all metrics

Abstract

Human behavior recognition has always been a hot spot for research in computer vision. With the wide application of behavior recognition in virtual reality and short video in recent years and the rapid development of deep learning algorithms, behavior recognition algorithms based on deep learning have emerged. Compared with traditional methods, behavior recognition algorithms based on deep learning have the advantages of strong robustness and high accuracy. This paper systemizes and introduces behavior recognition algorithms based on deep learning proposed in recent years, then focuses on a series of behavior recognition algorithms based on image and bone data; deeply analyzes their theories and performance, and finally, puts forward further prospects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Fig. 10

Fig. 22

A Systematic Survey on Human Behavior Recognition Methods

Article 23 October 2021

A Behavior Recognition Algorithm Based on 3D-VGG-GAP Network

Literature Review on Human Behavioural Analysis Using Deep Learning Algorithm

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The data and code used to support the findings of this study are available from the corresponding author upon request (001600@nuist.edu.cn).

References

Arandjelovic R, Zisserman A (2013) All about vlad. Proceedings of the ieee conference on computer vision and pattern recognition (pp.1578–1585)
Chen B, Xia M, Huang J (2021) Mfanet: a multi-level feature aggregation network for semantic segmentation of land cover. Remote Sensing 13(4):731
Article Google Scholar
Cheng K, Zhang Y, He X, Chen W, Cheng J, Lu H (2020) Skeleton-based action recognition with shift graph convolutional network. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp.183-192)
Cho S, Foroosh H (2018). Spatio-temporal fusion networks for action recognition. Asian conference on computer vision (pp. 347-364)
Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 7024-7033)
Deng S, Fu Y, Wang H (2017) Multi-label classification of chinese books with lstm model. Data Analysis and Knowledge Discovery 1(7):52–60
Google Scholar
Diba A, Sharma V, Van Gool L (2017) Deep temporal linear encoding networks. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 2329-2338)
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. Proceed- ings of the ieee conference on computer vision and pattern recognition (pp. 2625-2634)
Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent poseattention network for action recognition in videos. Proceedings of the ieee international conference on computer vision (pp. 3725-3734)
Du Y, Fu Y, Wang L (2015) Skeleton based action recognition with convolutional neural network. 2015 3rd iapr asian conference on pattern recognition (acpr) (pp. 579-583)
Duta IC, Ionescu B, Aizawa K, Sebe N (2017) Spatio-temporal vlad encoding for human action recognition in videos. International conference on multimedia modeling (pp. 365-378)
Feichtenhofer C, Fan H, Malik J, He K (2019) Slowfast networks for video recognition. Proceedings of the ieee/cvf international conference on computer vision (pp. 6202-6211)
Goodfellow I, Bengio Y, Courville A, Bengio Y (2016) Deep learning. MIT press Cambridge, USA
MATH Google Scholar
He J, Wu X, Cheng Z, Yuan Z, Jiang Y (2021) Db-lstm: Densely-connected bi-directional lstm for human action recognition. Neurocomputing 444:319–331
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 770-778)
He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. European conference on computer vision (pp. 630-645)
Zhu H, Zhu C, Xu Z (2018) Research advances on human activity recognition datasets. Acta Automatica Sinica 44(6):978–1004
Google Scholar
Luo H, Wang C, Lu F (2018) Survey of video behavior recognition. J Commun 39(6):169
Google Scholar
Huang J (2016) Chinese word segmentation analysis based on bidirectional lstmn recurrent neural network. Nanjing University Jiangsu
Kazakos E, Nagrani A, Zisserman A, Damen D (2021) Slow-fast auditory streams for audio recognition. Icassp 2021-2021 ieee interna- tional conference on acoustics, speech and signal processing (icassp) (pp. 855-859)
Kondratyuk D, Yuan L, Li Y, Zhang L, Tan M, Brown M, Gong B (2021) Movinets: Mobile video networks for efficient video recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 16020-16030)
Korban M, Li X (2020) Ddgcn: A dynamic directed graph convolutional network for action recognition. European conference on computer vision (pp. 761-776)
Lan Z, Zhu Y, Hauptmann AG, Newsam S (2017) Deep local video feature for action recognition. Proceedings of the ieee conference on computer vision and pattern recognition workshops (pp. 1-7)
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11):2278–2324
Article Google Scholar
Li B, Li X, Zhang Z, Wu F (2019) Spatio-temporal graph routing for skeleton-based action recognition. Proceedings of the aaai conference on artificial intelligence 33:8561–8568
Article Google Scholar
Li C, Zhong Q, Xie D, Pu S (2019) Collaborative spatiotemporal feature learning for video action recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7872-7881)
Li D, Liu H, Zhang Z, Lin K, Fang S, Li Z, Xiong NN (2021) Carm: Confidence-aware recommender model via review representation learning and historical rating behavior in the online platforms. Neurocomputing 455:283–296
Article Google Scholar
Li M, Chen S, Chen X, Zhang Y, Wang Y, Tian Q (2019) Actional–structural graph convolutional networks for skeleton-based action recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 3595-3603)
Li Y, Ji B, Shi X, Zhang J, Kang B, Wang L (2020) Tea: Temporal excitation and aggregation for action recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 909- 918)
Li Z, Gavrilyuk K, Gavves E, Jain M, Snoek CG (2018) Videolstm convolves, attends and flows for action recognition. Comput Vision Image Understanding 166:41–50
Article Google Scholar
Li Z, Liu H, Zhang Z, Liu T, Xiong NN (2021) Learning knowledge graph embedding with heterogeneous relation attention networks. IEEE Transactions on Neural Networks and Learning Systems
Liu S (2017) Video-based action recognition. Hebei Normal University
Liu H, Fang S, Zhang Z, Li D, Lin K, Wang J (2021) Mfdnet: Collaborative poses perception and matrix fisher distribution for head pose estimation. IEEE Transactions on Multimedia
Liu H, Nie H, Zhang Z, Li Y (2021) Anisotropic angle distribution learning for head pose estimation and attention understanding in humancomputer interaction. Neurocomputing 433:310–322
Article Google Scholar
Liu T, Liu H, Li Y, Zhang Z, Liu S (2018) Efficient blind signal reconstruction with wavelet transforms regularization for educational robot infrared vision sensing. IEEE/ASME Transactions on Mechatronics 24(1):384–394
Article Google Scholar
Liu T, Liu H, Li Y, Chen Z, Zhang Z, Liu S (2019) Flexible ftir spectral imaging enhancement for industrial robot infrared vision sensing. IEEE Transac Indus Informatics 16(1):544–554
Article Google Scholar
Long X, Gan C, Melo G, Liu X, Li Y, Li F, Wen S (2018) Multimodal keyless attention fusion for video classification. Proceedings of the aaai conference on artificial intelligence (Vol. 32)
Majd M, Safabakhsh R (2020) Correlational convolutional lstm for human action recognition. Neurocomputing 396:224–229
Article Google Scholar
Muhammad K, Ullah A, Imran AS, Sajjad M, Kiran MS, Sannino G et al (2021) Human action recognition using attention based lstm network with dilated cnn features. Future Generation Comp Syst 125:820–830
Article Google Scholar
Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Computer Vision and Image Understanding 150:109–125
Article Google Scholar
Peng X, Zou C, Qiao Y, Peng Q (2014) Action recognition with stacked fisher vectors. Springer, Cham, pp 581–595
Google Scholar
Qu Y, Xia M, Zhang Y (2021) Strip pooling channel spatial attention network for the segmentation of cloud and cloud shadow. Comput Geosci 157:104940
Article Google Scholar
Ren P, Xiao G, Chang X, Xiao Y, Li Z, Chen X (2021) Nas-tc: Neural architecture search on temporal convolutions for complex action recognition. arXiv preprint arXiv:2104.01110
Shen X, Yi B, Liu H, Zhang W, Zhang Z, Liu S, Xiong N (2019) Deep variational matrix factorization with knowledge embedding for recommendation system. IEEE Transactions on Knowledge and Data Engineering 33(5):1906–1918
Google Scholar
Shi L, Zhang Y, Cheng J, Lu H (2019a) Skeleton-based action recognition with directed graph neural networks. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7912-7921)
Shi L, Zhang Y, Cheng J, Lu H (2019b) Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 12026-12035)
Si C, Chen W, Wang W, Wang L, Tan T (2019) An attention enhanced graph convolutional lstm network for skeleton-based action recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 1227-1236)
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199
Song L, Xia M, Jin J, Qian M, Zhang Y (2021) Suacdnet: Attentional change detection network based on siamese u-shaped structure. Int J Appl Earth Obser Geoinformat 105:102597
Article Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the ieee international conference on computer vision (pp. 4489-4497)
Wang H, Kläser A, Schmid C, Liu C-L (2013) Dense trajectories and motion boundary descriptors for action recognition. Int J Comput Vision 103(1):60–79
Article MathSciNet Google Scholar
Wang H, Schmid C (2013) Action recognition with improved trajectories. Proceedings of the ieee international conference on computer vision (pp. 3551-3558)
Wang L, Xiong Y, Wang Z, Qiao Y, Lin D, Tang X, Van Gool L (2016) Temporal segment networks: Towards good practices for deep action recognition. European conference on computer vision (pp. 20-36)
Wang X, Miao Z, Zhang R, Hao S (2019) I3d-lstm A new model for human action recognition. Iop Conference Series: Mater Sci Engin 569:032035
Article Google Scholar
Wu C, Zaheer M, Hu H, Manmatha R, Smola AJ, Krähenbühl P (2018) Compressed video action recognition. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 6026–6035)
Xia M, Cui Y, Zhang Y, Xu Y, Liu J, Xu Y (2021a) Dau-net: a novel water areas segmentation structure for remote sensing image. Int J Remote Sensing 42(7):2594–2621
Article Google Scholar
Xia M, Qu Y, Lin H (2021b) Panda: parallel asymmetric network with double attention for cloud and its shadow detection. J Appl Remote Sens 15(4):046512
Article Google Scholar
Xia M, Wang K, Song W, Chen C, Li Y et al (2020a) Non-intrusive load disaggregation based on composite deep long short-term memory network. Expert Syst Applicat 160:113669
Article Google Scholar
Xia M, Wang T, Zhang Y, Liu J, Xu Y (2021c) Cloud/shadow segmentation based on global attention feature fusion residual network for remote sensing imagery. Int J Remote Sensing 42(6):2022–2045
Article Google Scholar
Xia M, Zhang X, Weng L, Xu Y et al (2020b) Multi-stage feature constraints learning for age estimation. IEEE Transact Informat Forensic Sec 15:2417–2428
Article Google Scholar
Xiao F, Lee YJ, Grauman K, Malik J, Feichtenhofer C (2020c) Audiovisual slowfast networks for video recognition. arXiv preprint arXiv:2001.08740
Yan A, Wang Y, Li Z, Qiao Y (2019) Pa3d: Pose-action 3d machine for video recognition. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 7922-7931)
Yan S, Xiong Y, Lin D (2018) Spatial temporal graph convolutional networks for skeleton-based action recognition. Proceedings of the aaai conference on artificial intelligence (Vol. 32)
Yan Y, Xu J, Ni B, Zhang W, Yang X (2017) Skeleton-aided articulated motion generation. Proceedings of the 25th acm international conference on multimedia (pp. 199-207)
Yang H, Gu Y, Zhu J, Hu K, Zhang X (2020) Pgcn-tca: Pseudo graph convolutional network with temporal and channel-wise attention for skeleton-based action recognition. IEEE Access 8:10040–10047
Article Google Scholar
Yang X, Tian Y (2014) Action recognition using super sparse coding vector with spatio-temporal awareness. European conference on computer vision (pp. 727-741)
Chen Y, Gao X (2018) The latest progress of deep learning. Comput Sci Appl 08(04):565–571
Google Scholar
Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 4694-4702)
Zhang Y (2018) Text sentiment analysis based on multiple lstm structures. Beijing University of Posts and Telecommunications
Zhang S, Gong Y, Wang J (2017) The development of deep convolution neural network and its applications on computer vision. Chinese J Comput 40(9):1–29
MathSciNet Google Scholar
Zhang Z, Li Z, Liu H, Xiong NN (2020) Multi-scale dynamic convolutional network for knowledge graph embedding. IEEE Transactions on Knowledge and Data Engineering
Zhang Z, Wang Z, Zhuang S, Huang F (2020) Structure-feature fusion adaptive graph convolutional networks for skeleton-based action recognition. IEEE Access 8:228108–228117
Article Google Scholar
Zhao J, Snoek CG (2019) Dance with flow: Two-in-one stream action detection. Proceedings of the ieee/cvf conference on computer vision and pattern recognition (pp. 9935-9944)
Ren Z, Xu H, Feng S, Zhou H, Shi J (2017) Sequence labeling chinese word segmentation method based on lstm networks. Appl Res Comput 34(5):1321–1324
Google Scholar
Zhou Y, Sun X, Zha Z, Zeng W (2018) Mict: Mixed 3d/2d convolutional tube for human action recognition. Proceedings of the ieee conference on computer vision and pattern recognition (pp. 449–458)
Zhu Y, Li X, Liu C, Zolfaghari M, Xiong Y, Wu C, Li M (2020). A comprehensive study of deep video action recognition. arXiv preprint arXiv:2012.06567

Download references

Acknowledgements

Research in this article is supported by the National Natural Science Foundation of China (No. 61876079), the key special project of the National Key R &D Program (2018YFC1405703), the financial support of Jiangsu Austin Optronics Technology Co., Ltd. is deeply appreciated, and I would like to express my heartfelt thanks to those reviewers and editors who submitted valuable revisions to this article.

Author information

Junlan Jin, Fei Zheng, Liguo Weng and Yiwu Ding have contributed equally to this work.

Authors and Affiliations

School of Automation, Nanjing University of Information Science and Technology, No.219, Ningliu Road, 210044, Nanjing, Jiangsu, China
Kai Hu, Junlan Jin, Liguo Weng & Yiwu Ding
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, No.219, Ningliu Road, 210044, Nanjing, Jiangsu, China
Kai Hu, Junlan Jin, Fei Zheng, Liguo Weng & Yiwu Ding
Innovation Department of Industrial Internet, China Telecom Ningbo Branch, No.96 HeYi Road, 315000, Ningbo, Zhejiang, China
Fei Zheng

Authors

Kai Hu
View author publications
You can also search for this author inPubMed Google Scholar
Junlan Jin
View author publications
You can also search for this author inPubMed Google Scholar
Fei Zheng
View author publications
You can also search for this author inPubMed Google Scholar
Liguo Weng
View author publications
You can also search for this author inPubMed Google Scholar
Yiwu Ding
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors drafted the manuscript, read, and approved the final manuscript.

Corresponding author

Correspondence to Kai Hu.

Ethics declarations

Conflict of interest

No potential conflict of interest were reported by the author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hu, K., Jin, J., Zheng, F. et al. Overview of behavior recognition based on deep learning. Artif Intell Rev 56, 1833–1865 (2023). https://doi.org/10.1007/s10462-022-10210-8

Download citation

Published: 21 June 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10462-022-10210-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Overview of behavior recognition based on deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Systematic Survey on Human Behavior Recognition Methods

A Behavior Recognition Algorithm Based on 3D-VGG-GAP Network

Literature Review on Human Behavioural Analysis Using Deep Learning Algorithm

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now