Skip to main content

Abstract

This paper introduces a multimodal dataset created for research on digital twins in the manufacturing domain. Digital twins refer to the digital representations of physical world objects, and they require data to be accurately modeled. By incorporating various data modes, the digital twin representations in computational environments can become more complex and precise. To this end, we propose a dataset that consists of videos recorded inside a manufacturing laboratory, featuring different people performing assembly sequences in varying ways. In addition to the videos, we also incorporated facial capture, lateral capture, and top capture to analyze the pose of the subjects, position of hands and tools, and actions performed during product assembly. Our dataset was able to successfully label 3 different actions (hold, release, screw) for 4 different kinds of tools (ratchet, wrench, allen key, screwdriver), indicating when the subject starts and ends each action for each tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://github.com/david-alfarov/-HAMD-ME/blob/main/README.md.

References

  1. Cicirelli, G., et al.: The HA4M dataset: multi-modal monitoring of an assembly task for human action recognition in manufacturing. Sci. Data 9 (2022)

    Google Scholar 

  2. Shinde, S., Kothari, A., Gupta, V.: YOLO based human action recognition and localization. Procedia Comput. Sci. 133, 831–838 (2018)

    Article  Google Scholar 

  3. Voronin, V., Zhdanova, M., Zelenskii, A., Agaian, S.: Action recognition for the robotics and manufacturing automation using 3-D binary micro-block difference. Int. J. Adv. Manuf. Technol. (2021)

    Google Scholar 

  4. Koch, J., Büsch, L., Gomse, M., Schüppstuhl, T.: A methods-time-measurement based approach to enable action recognition for multi-variant assembly in human-robot collaboration. Procedia CIRP 106, 233–238 (2022). https://doi.org/10.1016/j.procir.2022.02.184

  5. Dallel, M., Havard, V., Dupuis, Y., Baudry, D.: Digital twin of an industrial workstation: a novel method of an auto-labeled data generator using virtual reality for human action recognition in the context of human-robot collaboration. Eng. Appl. Artif. Intell. 118, 105655 (2023). https://doi.org/10.1016/j.engappai.2022.105655

  6. Al-Amin, M., et al.: Action recognition in manufacturing assembly using multimodal sensor fusion. Procedia Manuf. 39, 158–167 (2019). https://doi.org/10.1016/j.promfg.2020.01.288

  7. Alfaro-Viquez, D., Zamora-Hernandez, M., Benavent-Lledo, M., Garcia-Rodriguez, J., Azorín-López, J.: Monitoring human performance through deep learning and computer vision in industry 4.0. In: 17th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2022), pp. 309–318 (2023)

    Google Scholar 

  8. Rathore, A., Hafi, L., Ricardez, G., Taniguchi, T.: Human action categorization system using body pose estimation for multimodal observations from single camera. In: 2022 IEEE/SICE International Symposium on System Integration (SII) (2022). https://doi.org/10.1109/sii52469.2022.9708816

  9. Guan, S., Lu, H., Zhu, L., Fang, G.: AFE-CNN: 3D Skeleton-based action recognition with action feature enhancement. Neurocomputing 514, 256–267 (2022)

    Google Scholar 

  10. Wu, L., Zhang, C., Zou, Y.: SpatioTemporal focus for skeleton-based action recognition. Pattern Recogn. 136 (2023)

    Google Scholar 

  11. Varol, G., Laptev, I., Schmid, C., Zisserman, A.: Synthetic humans for action recognition from unseen viewpoints. Int. J. Comput. Vis. 129, 2264–2287 (2021)

    Google Scholar 

  12. Islam, M., Bakhat, K., Khan, R., Iqbal, M., Islam, M., Ye, Z.: Action recognition using interrelationships of 3D joints and frames based on angle sine relation and distance features using interrelationships. Appl. Intell. 51, 6001–6013 (2021). https://link.springer.com/10.1007/s10489-020-02176-3

  13. Dallel, M., Havard, V., Baudry, D., Savatier, X.: An industrial human action recogniton dataset in the context of industrial collaborative robotics. In: IEEE International Conference on Human-Machine Systems ICHMS (2020). https://github.com/vhavard/InHARD

  14. Amjad, F., Khan, M., Nisar, M., Farid, M., Grzegorzek, M.: A comparative study of feature selection approaches for human activity recognition using multimodal sensory data. Sensors 21, 2368 (2021). https://doi.org/10.3390/s21072368

  15. Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I.: Egocentric vision-based action recognition: a survey. Neurocomputing 472, 175–197 (2022)

    Google Scholar 

  16. Lin, J., Mu, Z., Zhao, T., Zhang, H., Yang, X., Zhao, P.: Action density based frame sampling for human action recognition in videos. J. Vis. Commun. Image Represent. 90, 103740 (2023). https://doi.org/10.1016/j.jvcir.2022.103740

  17. Patil, A.A., Swaminathan, A., Gayathri, R.: Human action recognition using Skeleton features. In: 2022 IEEE International Symposium on Mixed and Augmented Reality Adjunct (ISMAR-Adjunct) (2022). https://doi.org/10.1109/ismar-adjunct57072.2022.00066

  18. Tasnim, N., Baek, J.: Dynamic edge convolutional neural network for skeleton-based human action recognition. Sensors 23 (2023)

    Google Scholar 

  19. Li, R., Wang, H., Liu, Z., Cheng, N., Xie, H.: First-person hand action recognition using multimodal data. IEEE Trans. Cogn. Dev. Syst. 14, 1449–1464 (2022). https://doi.org/10.1109/tcds.2021.3108136

  20. Ren, Z., Zhang, Q., Cheng, J., Hao, F., Gao, X.: Segment spatial-temporal representation and cooperative learning of convolution neural networks for multimodal-based action recognition. Neurocomputing 433, 142–153 (2021)

    Google Scholar 

  21. Simon, T., Joo, H., Matthews, I., Sheikh, Y.: Hand keypoint detection in single images using multiview bootstrapping. In: CVPR (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Alfaro-Viquez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alfaro-Viquez, D., Zamora-Hernandez, MA., Grillo, H., Garcia-Rodriguez, J., Azorín-López, J. (2023). A Multimodal Dataset to Create Manufacturing Digital Twins. In: García Bringas, P., et al. 18th International Conference on Soft Computing Models in Industrial and Environmental Applications (SOCO 2023). SOCO 2023. Lecture Notes in Networks and Systems, vol 750. Springer, Cham. https://doi.org/10.1007/978-3-031-42536-3_16

Download citation

Publish with us

Policies and ethics