Skip to main content
Log in

Research on human behavior recognition in factory environment based on 3-2DCNN-BIGRU fusion network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In the safety monitoring of workers in manufacturing enterprises, there is a problem of large amount of calculation in the identification model caused by multi-objective high concurrent behavior. In this paper, we propose a human behavior recognition model that combines multi-dimensional convolution and gated recurrent neural network from the perspective of model structure design. The single target human behavior data set of the factory was constructed by YOLOv7 target detection and BOT-SORT multi-target tracking. Human behavior recognition model 3-2DCNN-BIGRU, in the mixed spatio-temporal feature extraction layer, uses the advantages of 3DCNN in spatio-temporal feature extraction to extract spatio-temporal features; The 3-2DCNN is used to extract spatial features after dimension reduction to improve the computational complexity and reduce the complexity of the model; Using the idea of expansion convolution in the time convolution network, the receptive fields of 3DCNN and 3-2DCNN are increased, and the ability of spatio-temporal feature extraction of the model is enhanced. In the time feature enhancement layer, a bidirectional gated recurrent neural network is fused to enhance the model’s ability to extract time features, thereby improving the overall performance of the model. With fewer parameters, the accuracy on the Fall Dataset reaches 98.65%, which can effectively identify human behaviors such as walking, sitting, and falling in factories, ensuring the safety of workers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

Data availability

No datasets were generated or analysed during the current study.

References

  1. Yue, R., Tian, Z., Du, S.: Action recognition based on RGB and skeleton data sets: A survey. Neurocomputing. 512, 287–306 (2022). https://doi.org/10.1016/j.neucom.2022.09.071

    Article  MATH  Google Scholar 

  2. Nunez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Egocentric Vision-based Action Recognition: A survey. NEUROCOMPUTING. 472, 175–197 (2022). https://doi.org/10.1016/j.neucom.2021.11.081

    Article  Google Scholar 

  3. Zhang, H., Liu, X., Yu, D., Guan, L., Wang, D., Ma, C., Hu, Z.: Skeleton-based action recognition with multi-stream, multi-scale dilated spatial-temporal graph convolution network. Appl. Intell. 53, 17629–17643 (2023). https://doi.org/10.1007/s10489-022-04365-8

    Article  Google Scholar 

  4. Wu, N., Kera, H., Kawamoto, K.: Improving zero-shot action recognition using human instruction with text description. Appl. Intell. 53, 24142–24156 (2023). https://doi.org/10.1007/s10489-023-04808-w

    Article  MATH  Google Scholar 

  5. Qi, Y., Hu, J., Zhuang, L., Pei, X.: Semantic-guided multi-scale human skeleton action recognition. Appl. Intell. 53, 9763–9778 (2023). https://doi.org/10.1007/s10489-022-03968-5

    Article  Google Scholar 

  6. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning Spatiotemporal Features with 3D Convolutional Networks. In: 2015 IEEE International Conference on Computer Vision (ICCV). pp. 4489–4497 (2015)

  7. Hara, K., Kataoka, H., Satoh, Y.: Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet? In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6546–6555 (2018)

  8. Muhammad, K., Mustaqeem, Ullah, A., Imran, A.S., Sajjad, M., Kiran, M.S., Sannino, G., de Albuquerque, V.H.C.: Human action recognition using attention based LSTM network with dilated CNN features. Future Gener Comput. Syst. 125, 820–830 (2021). https://doi.org/10.1016/j.future.2021.06.045

    Article  Google Scholar 

  9. Tan, K.S., Lim, K.M., Lee, C.P., Kwek, L.C.: Bidirectional long short-term memory with temporal dense sampling for human action recognition. Expert Syst. Appl. 210, 118484 (2022). https://doi.org/10.1016/j.eswa.2022.118484

    Article  Google Scholar 

  10. Afza, F., Khan, M.A., Sharif, M., Kadry, S., Manogaran, G., Saba, T., Ashraf, I., Damaševičius, R.: A framework of human action recognition using length control features fusion and weighted entropy-variances based feature selection. Image Vis. Comput. 106, 104090 (2021). https://doi.org/10.1016/j.imavis.2020.1040

    Article  Google Scholar 

  11. Carreira, J., Zisserman, A.: Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4724–4733. IEEE, Honolulu, HI (2017)

  12. Dai, C., Liu, X., Lai, J.: Human action recognition using two-stream attention based LSTM networks. Appl. Soft Comput. 86, 105820 (2020). https://doi.org/10.1016/j.asoc.2019.105820

    Article  Google Scholar 

  13. Hu, W., Fu, C., Cao, R., Zang, Y., Wu, X.-J., Shen, S., Gao, X.-Z.: Joint dual-stream interaction and multi-scale feature extraction network for multi-spectral pedestrian detection. Appl. Soft Comput. 147, 110768 (2023). https://doi.org/10.1016/j. asoc.2023.110768

    Article  MATH  Google Scholar 

  14. Senthilkumar, N., Manimegalai, M., Karpakam, S., Ashokkumar, S.R., Premkumar, M.: Human action recognition based on spatial–temporal relational model and LSTM-CNN framework. Mater. Today Proc. 57, 2087–2091 (2022). https://doi.org/10.1016/j.matpr.2021.12.004

    Article  Google Scholar 

  15. Jaouedi, N., Boujnah, N., Bouhlel, M.S.: A new hybrid deep learning model for human action recognition. J. King Saud Univ. - Comput. Inf. Sci. 32, 447–453 (2020). https://doi.org/10.1016/j.jksuci.2019.09.004

    Article  Google Scholar 

  16. Zhang, Z., Lv, Z., Gan, C., Zhu, Q.: Human action recognition using convolutional LSTM and fully-connected LSTM with different attentions. Neurocomputing. 410, 304–316 (2020). https://doi.org/10.1016/j.neucom.2020.06.032

    Article  MATH  Google Scholar 

  17. Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., Paluri, M.: A Closer Look at Spatiotemporal Convolutions for Action Recognition. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 6450–6459 (2018)

  18. Liu, X., Xiong, S., Wang, X., Liang, T., Wang, H., Liu, X.: A compact multi-branch 1D convolutional neural network for EEG-based motor imagery classification. Biomed. Signal. Process. Control. 81, 104456 (2023). https://doi.org/10.1016/j.bspc.2022

    Article  MATH  Google Scholar 

  19. Cui, J., Lan, Z., Liu, Y., Li, R., Li, F., Sourina, O., Müller-Wittig, W.: A compact and interpretable convolutional neural network for cross-subject driver drowsiness detection from single-channel EEG. Methods. 202, 173–184 (2022). https://doi.org/10.1016/j. ymeth.2021.04.017

    Article  Google Scholar 

  20. Soomro, K., Zamir, A.R., Shah, M.: UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild, (2012). http://arxiv.org/abs/1212.0402

  21. Adhikari, K., Bouchachia, H., Nait-Charif, H.: Activity recognition for indoor fall detection using convolutional neural network. In: 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA). pp. 81–84 (2017)

  22. Kuehne, H., Jhuang, H., Garrote, E., Poggio, T., Serre, T.: HMDB: A large video database for human motion recognition. In: 2011 International Conference on Computer Vision. pp. 2556–2563 (2011)

  23. Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y.M.: YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 7464–7475 (2023)

  24. Aharon, N., Orfaig, R., Bobrovsky, B.-Z.: BoT-SORT: Robust Associations Multi-Pedestrian Tracking, (2022). http://arxiv.org/abs/2206.14651

Download references

Funding

The research was supported by the Natural Science Foundation of Shaanxi Province (No.2021SF-422) and the Natural Science Foundation of Shaanxi Province (No.2024GX-YBXM-190).

Author information

Authors and Affiliations

Authors

Contributions

This paper was written by Wang Zhenyu, who independently completed the research design, data analysis and result interpretation, and provided in-depth insights into the research results. Zheng Jianming gave four instructions on the overall structure of the paper. Yang Mingshun gave two instructions on the experimental design and data analysis of the paper. Shi Weichao was responsible for reviewing the literature and provided an important knowledge framework in the background part of the paper. Su Yulong is responsible for revising and proofreading the paper. Chen Ting provided resources related to the experiment and provided key support in the process of experiment design and implementation. Peng Chao combed the language of the paper and optimized the pictures of the paper.

Corresponding author

Correspondence to Jianming Zheng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Z., Zheng, J., Yang, M. et al. Research on human behavior recognition in factory environment based on 3-2DCNN-BIGRU fusion network. SIViP 19, 102 (2025). https://doi.org/10.1007/s11760-024-03613-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03613-3

Keywords

Navigation