Abstract
Human falls during ladder climbing are typically instantaneous, making the timely and accurate determination of security risks during ladder climbing a challenging engineering issue. A skeleton-based behaviour recognition method is proposed for the real-time detection of human risky behaviour during ladder climbing in construction scenes. This method used a multi-modal feature fusion strategy to enhance the semantic information of the skeletal data and used an improved adaptive graph convolutional network by partial dense connection for behaviour recognition. This method was evaluated through ablation and comparative experiments on the public behaviour dataset. The results demonstrated its advantages in balancing accuracy and model complexity. Moreover, the experiment results on the ladder climbing behaviour dataset also validated its effectiveness in practical applications. The method in this paper will hopefully help to guarantee the personal safety of construction workers and provide information-based advance warning of security risks during ladder climbing.
Similar content being viewed by others
Data availability
All data are available from the corresponding author on reasonable request.
References
Han, S., Lee, S., Peña-Mora, F.: Vision-based detection of unsafe actions of a construction worker: case study of ladder climbing. J. Comput. Civ. Eng. 27(6), 635–644 (2013)
Fang, W., Ding, L., Luo, H., Love, P.E.: Falls from heights: a computer vision-based approach for safety harness detection. Autom. Constr. 91, 53–61 (2018)
Fang, Q., Li, H., Luo, X., Ding, L., Luo, H., Li, C.: Computer vision aided inspection on falling prevention measures for steeplejacks in an aerial environment. Autom. Constr. 93, 148–164 (2018)
Shen, J., Xiong, X., Li, Y., He, W., Li, P., Zheng, X.: Detecting safety helmet wearing on construction sites with bounding-box regression and deep transfer learning. Comput. Aided Civ. Infrastruct. Eng. 36(2), 180–196 (2021)
Wu, X., Li, Y., Long, J., Zhang, S., Wan, S., Mei, S.: A remote-vision-based safety helmet and harness monitoring system based on attribute knowledge modeling. Remote Sens. 15(2), 347 (2023)
Kim, D., Liu, M., Lee, S., Kamat, V.R.: Remote proximity monitoring between mobile construction resources using camera-mounted UAVs. Autom. Constr. 99, 168–182 (2019)
Fang, W., Zhong, B., Zhao, N., Love, P.E., Luo, H., Xue, J., Xu, S.: A deep learning-based approach for mitigating falls from height with computer vision: convolutional neural network. Adv. Eng. Inform. 39, 170–177 (2019)
Mei, X., Zhou, X., Xu, F., Zhang, Z.: Human intrusion detection in static hazardous areas at construction sites: deep learning-based method. J. Constr. Eng. Manag. 149(1), 04022142 (2023)
Zhang, S., Yang, Y., Xiao, J., Liu, X., Yang, Y., Xie, D., Zhuang, Y.: Fusing geometric features for skeleton-based action recognition using multilayer LSTM networks. IEEE Trans. Multimedia 20(9), 2330–2343 (2018)
Si, C., Chen, W., Wang, W., Wang, L., Tan, T.: An attention enhanced graph convolutional lstm network for skeleton-based action recognition. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 1227–1236 (2019)
Zhang, P., Lan, C., Xing, J., Zeng, W., Xue, J., Zheng, N.: View adaptive neural networks for high performance skeleton-based human action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 41(8), 1963–1978 (2019)
Banerjee, A., Singh, P.K., Sarkar, R.: Fuzzy integral-based CNN classifier fusion for 3d skeleton action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2206–2216 (2020)
Ding, W., Ding, C., Li, G., Liu, K.: Skeleton-based square grid for human action recognition with 3D convolutional neural network. IEEE Access 9, 54078–54089 (2021)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018, pp. 7444–7452 (2018)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Two-stream adaptive graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 12026–12035 (2019)
Li, M., Chen, S., Chen, X., Zhang, Y., Wang, Y., Tian, Q.: Actional-structural graph convolutional networks for skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 3595–3603 (2019)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with multi-stream adaptive graph convolutional networks. IEEE Trans. Image Process. 29, 9532–9545 (2020)
Feng, L., Zhao, Y., Zhao, W., Tang, J.: A comparative review of graph convolutional networks for human skeleton-based action recognition. Artif. Intell. Rev. 1–31 (2022)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017, pp. 4700–4708 (2017)
Zhu, Q., Deng, H., Wang, K.: Skeleton action recognition based on temporal gated unit and adaptive graph convolution. Electronics 11(18), 2973 (2022)
Alsarhan, T., Ali, U., Lu, H.: Enhanced discriminative graph convolutional network with adaptive temporal modelling for skeleton-based action recognition. Comput. Vis. Image Underst. 216, 103348 (2022)
Zhou, S.-B., Chen, R.-R., Jiang, X.-Q., Pan, F.: 2s-GATCN: two-stream graph attentional convolutional networks for skeleton-based action recognition. Electronics 12(7), 1711 (2023)
Shahroudy, A., Liu, J., Ng, T.-T., Wang, G.: NTU RGB+D: a large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 1010–1019 (2016)
Sun, Z., Ke, Q., Rahmani, H., Bennamoun, M., Wang, G., Liu, J.: Human action recognition from various data modalities: a review. IEEE Trans. Pattern Anal. Mach. Intell. 45, 3200–3225 (2022)
Weiyao, X., Muqing, W., Min, Z., Ting, X.: Fusion of skeleton and RGB features for RGB-D human action recognition. IEEE Sens. J. 21(17), 19157–19164 (2021)
Li, Z., Zhang, Q., Lv, S., Han, M., Jiang, M., Song, H.: Fusion of RGB, optical flow and skeleton features for the detection of lameness in dairy cows. Biosyst. Eng. 218, 62–77 (2022)
Abavisani, M., Joze, H.R.V., Patel, V.M.: Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 1165–1174 (2019)
Song, Y.-F., Zhang, Z., Shan, C., Wang, L.: Richly activated graph convolutional network for robust skeleton-based action recognition. IEEE Trans. Circuits Syst. Video Technol. 31(5), 1915–1925 (2020)
Pérez-Rúa, J.-M., Vielzeuf, V., Pateux, S., Baccouche, M., Jurie, F.: MFAS: multimodal fusion architecture search. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019, pp. 6966–6975 (2019)
Das, S., Sharma, S., Dai, R., Bremond, F., Thonnat, M.: VPN: learning video-pose embedding for activities of daily living. In: Proceedings of the Computer Vision-ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, pp. 72–90 (2020)
Duan, H., Zhao, Y., Chen, K., Lin, D., Dai, B.: Revisiting skeleton-based action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022, pp. 2969–2978 (2022)
Liang, X., Qian, Y., Guo, Q., Cheng, H., Liang, J.: AF: an association-based fusion method for multi-modal classification. IEEE Trans. Pattern Anal. Mach. Intell. 44(12), 9236–9254 (2021)
Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.: Actor-transformers for group activity recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020, pp. 839–848 (2020)
Guo, H., Yu, Y., Ding, Q., Skitmore, M.: Image-and-skeleton-based parameterized approach to real-time identification of construction workers’ unsafe behaviors. J. Constr. Eng. Manag. 144(6), 04018042 (2018)
Yu, Y., Guo, H., Ding, Q., Li, H., Skitmore, M.: An experimental study of real-time identification of construction workers’ unsafe behaviors. Autom. Constr. 82, 193–206 (2017)
Anjum, S., Khan, N., Khalid, R., Khan, M., Lee, D., Park, C.: Fall prevention from ladders utilizing a deep learning-based height assessment method. IEEE Access 10, 36725–36742 (2022)
Ding, L., Fang, W., Luo, H., Love, P.E., Zhong, B., Ouyang, X.: A deep hybrid learning model to detect unsafe behavior: integrating convolution neural networks and long short-term memory. Autom. Constr. 86, 118–124 (2018)
Yao, L., Shuangjian, J.: Application of ST-GCN in unsafe action identification of construction workers. China Saf. Sci. J. 32(4), 30 (2022)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2016)
Zhang, X., Xu, C., Tao, D.: Context aware graph convolution for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016, pp. 770–778 (2020)
Zhou, S.-B., Chen, R.-R., Jiang, X.-Q., Pan, F.: 2s-GATCN: two-stream graph attentional convolutional networks for skeleton-based action recognition. Electronics 12(7), 1711 (2023)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2021)
Bian, C., Feng, W., Wan, L., Wang, S.: Structural knowledge distillation for efficient skeleton-based action recognition. IEEE Trans. Image Process. 30, 2963–2976 (2021)
Wu, H., Ma, X., Li, Y.: Spatiotemporal multimodal learning with 3D CNNs for video action recognition. IEEE Trans. Circuits Syst. Video Technol. 32(3), 1250–1261 (2021)
Funding
This research was supported by the Humanities and Social Science Project of Anhui Provincial Department of Education under Grant: 2022AH050224.
Author information
Authors and Affiliations
Contributions
WZ contributed to conceptualization and writing; WZ and RC were involved in methodology; DS and TH contributed to formal analysis; WZ and RH were involved in investigation; WZ, TH, RC, JW, and RH contributed to data curation; and DS was involved in supervision. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhu, W., Shi, D., Cheng, R. et al. Human risky behaviour recognition during ladder climbing based on multi-modal feature fusion and adaptive graph convolutional network. SIViP 18, 2473–2483 (2024). https://doi.org/10.1007/s11760-023-02923-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-023-02923-2