Abstract
Pose estimation is a computer vision task used to estimate a skeleton of dynamic systems to predict future movements. Most of the research in this direction is based on a supervised learning approach which requires a massive amount of labeled datasets. In this paper, a self-supervised three-stage model based on a contrastive learning approach is introduced for estimating a skeleton of dynamic construction machines; such as excavators without using any labeled images for the first stage. The whole model structure is divided into three stages: the pre-train stage using the SimCLR contrastive approach, and two fine-tuning stages for the transfer learning and the downstream task. The model can leverage the features and learn from a huge unlabeled dataset called ACID to two small datasets generated from NVIDIA Isaac and MATLAB Simscape simulators as well as transfer the knowledge to a smaller dataset with a ratio of 3.5% from the original ACID dataset. The results show that the proposed approach can improve the accuracy of pose estimation for heavy construction machines in real images by 11% and 13% in comparison to the normal self-supervised approach with two backbones ResNet-50 and HRNet-W32, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Albelwi, S.: Survey on self-supervised learning: auxiliary pretext tasks and contrastive learning methods in imaging. Entropy 24(4), 551 (2022)
Cao, J., Tang, H., Fang, H.S., Shen, X., Lu, C., Tai, Y.W.: Cross-domain adaptation for animal pose estimation. In: IEEE/CVF International Conference on Computer Vision, pp. 9498–9507 (2019)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. Adv. Neural. Inf. Process. Syst. 33, 9912–9924 (2020)
Chan, C., Tan, S.: Determination of the minimum bounding box of an arbitrary solid: an iterative approach. Comput. Struct. 79(15), 1433–1449 (2001)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: a survey. Tsinghua Sci. Technol. 24(6), 663–676 (2019)
Graving, J.M., et al.: Deepposekit, a software toolkit for fast and robust animal pose estimation using deep learning. Elife 8, e47994 (2019)
Jaiswal, A., Babu, A.R., Zadeh, M.Z., Banerjee, D., Makedon, F.: A survey on contrastive self-supervised learning. Technologies 9(1), 2 (2020)
Jin, S., et al.: Differentiable hierarchical graph grouping for multi-person pose estimation. In: 16th European Conference on Computer Vision, pp. 718–734 (2020)
Jin, S., et al.: Whole-body human pose estimation in the wild. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 196–214. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_12
Lan, G., Wu, Y., Hu, F., Hao, Q.: Vision-based human pose estimation via deep learning: a survey. IEEE Trans. Hum. Mach. Syst. (2022)
Lin, C., et al.: Structure-coherent deep feature learning for robust face alignment. IEEE Trans. Image Process. 30, 5313–5326 (2021)
Lin, Z.H., Chen, A.Y., Hsieh, S.H.: Temporal image analytics for abnormal construction activity identification. Autom. Constr. 124, 103572 (2021)
Luo, H., Wang, M., Wong, P.K.Y., Cheng, J.C.: Full body pose estimation of construction equipment using computer vision and deep learning techniques. Autom. Constr. 110, 103016 (2020)
Luo, H., Wang, M., Wong, P.K.Y., Tang, J., Cheng, J.C.: Construction machine pose prediction considering historical motions and activity attributes using gated recurrent unit (GRU). Autom. Constr. 121, 103444 (2021)
Luo, H., Liu, J., Fang, W., Love, P.E., Yu, Q., Lu, Z.: Real-time smart video surveillance to manage safety: a case study of a transport mega-project. Adv. Eng. Inform. 45, 101100 (2020)
Miller, S.: Excavator design with simscape (2023). https://github.com/simscape/Excavator-Simscape/releases/tag/23.1.51.5
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. Adv. Neural Inf. Process. Syst. 30 (2017)
Oquab, M., et al.: Dinov2: learning robust visual features without supervision (2023)
Pereira, T.D., et al.: Fast animal pose estimation using deep neural networks. Nat. Methods 16(1), 117–125 (2019)
Pham, H.T., Rafieizonooz, M., Han, S., Lee, D.E.: Current status and future directions of deep learning applications for safety management in construction. Sustainability 13(24), 13579 (2021)
Rani, V., Nabi, S.T., Kumar, M., Mittal, A., Kumar, K.: Self-supervised learning: a succinct review. Arch. Comput. Methods Eng. 30(4), 2761–2775 (2023)
Sarafianos, N., Boteanu, B., Ionescu, B., Kakadiaris, I.A.: 3D human pose estimation: a review of the literature and analysis of covariates. Comput. Vis. Image Underst. 152, 1–20 (2016)
Soltani, M.M., Zhu, Z., Hammad, A.: Skeleton estimation of excavator by detecting its parts. Autom. Constr. 82, 1–15 (2017)
Xiao, B., Kang, S.C.: Development of an image data set of construction machines for deep learning object detection. J. Comput. Civ. Eng. 35(2), 05020005 (2021)
Zhang, F., Zhu, X., Dai, H., Ye, M., Zhu, C.: Distribution-aware coordinate representation for human pose estimation. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7093–7102 (2020)
Zhao, J., Hu, Y., Tian, M.: Pose estimation of excavator manipulator based on monocular vision marker system. Sensors 21(13), 4478 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Alshubbak, A., Görges, D. (2023). A Self-supervised Pose Estimation Approach for Construction Machines. In: Bebis, G., et al. Advances in Visual Computing. ISVC 2023. Lecture Notes in Computer Science, vol 14362. Springer, Cham. https://doi.org/10.1007/978-3-031-47966-3_31
Download citation
DOI: https://doi.org/10.1007/978-3-031-47966-3_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-47965-6
Online ISBN: 978-3-031-47966-3
eBook Packages: Computer ScienceComputer Science (R0)