Skip to main content

Advertisement

Log in

Learning accurate personal protective equipment detection from virtual worlds

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Deep learning has achieved impressive results in many machine learning tasks such as image recognition and computer vision. Its applicability to supervised problems is however constrained by the availability of high-quality training data consisting of large numbers of humans annotated examples (e.g. millions). To overcome this problem, recently, the AI world is increasingly exploiting artificially generated images or video sequences using realistic photo rendering engines such as those used in entertainment applications. In this way, large sets of training images can be easily created to train deep learning algorithms. In this paper, we generated photo-realistic synthetic image sets to train deep learning models to recognize the correct use of personal safety equipment (e.g., worker safety helmets, high visibility vests, ear protection devices) during at-risk work activities. Then, we performed the adaptation of the domain to real-world images using a very small set of real-world images. We demonstrated that training with the synthetic training set generated and the use of the domain adaptation phase is an effective solution for applications where no training set is available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Aubry M, Russell BC (2015) Understanding deep features with computer-generated imagery. In: Proceedings of the IEEE international conference on computer vision, pp 2875–2883

  2. Bewley A, Rigley J, Liu Y, Hawke J, Shen R, Lam V-D, Kendall A (2018) Learning to drive from simulation without real world labels, arXiv:1812.03823

  3. Bochinski E, Eiselein V, Sikora T (2016) Training a convolutional neural network for multi-class object detection using solely virtual world data. In: Advanced Video and Signal Based Surveillance (AVSS), 2016 13th, IEEE international conference on IEEE, pp 278–285

  4. Chen C, Seff A, Kornhauser A, Xiao J (2015) Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE international conference on computer vision, pp 2722–2730

  5. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255

  6. di Benedetto M, Meloni E, Amato G, Falchi F, Gennaro C (2019) Learning safety equipment detection using virtual worlds. In: International conference on Content-Based multimedia indexing (CBMI), Sep. 2019, pp 8–13

  7. Everingham M, Eslami SMA, Van Gool L, Williams CKI, Winn J, Zisserman A (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  8. Fabbri M, Lanzi F, Calderara S, Palazzi A, Vezzani R, Cucchiara R (2018) Learning to detect and track visible and occluded body joints in a virtual world. In: European Conference on Computer Vision (ECCV)

  9. Filipowicz A, Liu J, Kornhauser A (2017) Learning to recognize distance to stop signs using the virtual world of grand theft auto 5, Tech. Rep.

  10. Hong Z-W, Yu-Ming C, Su S-Y, Shann T-Y, Chang Y-H, Yang H-K, Ho BH-L, Tu C-C, Chang Y-C, Hsiao T-C et al (2018) Virtual-to-real: Learning to control in visual semantic segmentation, arXiv:1802.00285

  11. Johnson-Roberson M, Barto C, Mehta R, Sridhar SN, Vasudevan R (2016) Driving in the matrix, Can virtual worlds replace human-generated annotations for real world tasks? arXiv:1610.01983

  12. Kuznetsova A, Rom H, Alldrin N, Uijlings JRR, Krasin I, Pont-tuset J, Kamali S, Popov S, Malloci M, Duerig T, Ferrari V (2018) The open images dataset V4: unified image classification, object detection, and visual relationship detection at scale. arXiv:1811.00982

  13. Lai K-T, Lin C-C, Kang C-Y, Liao M-E, Chen M-S (2018) Vivid: virtual environment for visual deep learning. In: Proceedings of the 26th ACM international conference on multimedia, ser. MM ’18. New York, NY, USA: ACM, pp 1356–1359

  14. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer vision – ECCV 2014. Springer International Publishing, Cham, pp 740–755

  15. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  16. Long M, Cao Y, Wang J, Jordan MI (2015) Learning transferable features with deep adaptation networks. In: Proceedings of the 32nd international conference on international conference on machine learning - volume 37, ser, ICML’15. http://JMLR.org, pp 97–105

  17. Luo W, Sun P, Zhong F, Liu W, Zhang T, Wang Y (2019) End-to-end active object tracking and its real-world deployment via reinforcement learning. IEEE Trans Pattern Anal Mach Intell

  18. Marín J, Vázquez D, Gerónimo D, López AM (2010) Learning appearance in virtual scenarios for pedestrian detection. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June 2010, pp 137–144

  19. Martinez M, Sitawarin C, Finch K, Meincke L, Yablonski A, Kornhauser A (2017) Beyond grand theft auto v for training, testing and enhancing deep learning in self driving cars, arXiv:1712.01397

  20. Meloni E, Di Benedetto M, Amato G, Falchi F, Gennaro C (2019) Project Website. http://aimir.isti.cnr.it/vw-ppe

  21. Qiu W, Yuille A (2016) Unrealcv: Connecting computer vision to unreal engine. In: European conference on computer vision. Springer, pp 909–916

  22. RAGE Plugin Hook (2013) https://ragepluginhook.net

  23. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: The IEEE conference on Computer Vision and Pattern Recognition (CVPR)

  24. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv:1804.02767

  25. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems, vol 28. Curran Associates, Inc., pp 91–99

  26. Richter SR, Vineet V, Roth S, Koltun V (2016) Playing for data: ground truth from computer games. In: European conference on computer vision. Springer, pp 102–118

  27. Rockstar Games Inc. (2013) Grand Theft Auto - V https://www.rockstargames.com/V

  28. Tremblay J, Prakash A, Acuna D, Brophy M, Jampani V, Anil C, To T, Cameracci E, Boochoon S, Birchfield S (2018) Training deep networks with synthetic data: bridging the reality gap by domain randomization. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 969–977

  29. Vázquez D, Lopez AM, Ponsa D (2012) Unsupervised domain adaptation of virtual and real worlds for pedestrian detection. In: Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012), pp 3492–3495

  30. Vázquez D, López AM, Marín J, Ponsa D, Gerónimo D (2014) Virtual and real world adaptation for pedestrian detection. IEEE Trans Pattern Anal Mach Intell 36(4):797–809

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by “Automatic Data and documents Analysis to enhance human-based processes” (ADA), funded by CUP CIPE D55F17000290009, and by the AI4EU project, funded by EC (H2020 - Contract n. 825619). We gratefully acknowledge the support of NVIDIA Corporation with the donation of a Jetson TX2 board used for this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marco Di Benedetto.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Di Benedetto, M., Carrara, F., Meloni, E. et al. Learning accurate personal protective equipment detection from virtual worlds. Multimed Tools Appl 80, 23241–23253 (2021). https://doi.org/10.1007/s11042-020-09597-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09597-9

Keywords

Navigation