Skip to main content

ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection

  • Conference paper
  • First Online:
Computer Vision – ECCV 2020 Workshops (ECCV 2020)

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12538))

Included in the following conference series:

Abstract

Robots collaborating with humans in realistic environments need to be able to detect the tools that can be used and manipulated. However, there is no available dataset or study that addresses this challenge in real settings. In this paper, we fill this gap with a dataset for detecting farming, gardening, office, stonemasonry, vehicle, woodworking, and workshop tools. The scenes in our dataset are snapshots of sophisticated environments with or without humans using the tools. The scenes we consider introduce several challenges for object detection, including the small scale of the tools, their articulated nature, occlusion, inter-class invariance, etc. Moreover, we train and compare several state of the art deep object detectors (including Faster R-CNN, Cascade R-CNN, YOLOv3, RetinaNet, RepPoint, and FreeAnchor) on our dataset. We observe that the detectors have difficulty in detecting especially small-scale tools or tools that are visually similar to parts of other tools. In addition, we provide a novel, practical safety use case with a deep network which checks whether the human worker is wearing the safety helmet, mask, glass, and glove tools. With the dataset, the code and the trained models, our work provides a basis for further research into tools and their use in robotics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    It is better to call some of these objects as equipment. However, since they provide similar functionalities (being used by a human or a robot while performing a task), we will just use the term tool to refer to all such objects, for the sake of simplicity.

References

  1. Abelha, P., Guerin, F.: Learning how a tool affords by simulating 3D models from the web. In: IROS (2017)

    Google Scholar 

  2. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. arxiv:1712.00726 (2017)

  3. Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: the YCB object and model set and benchmarking protocols. arXiv:1502.03143 (2015)

  4. Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv:1812.08008 (2018)

  5. Chen, K., Wang, J., Pang, J.E.A.: MMDetection: open MMLab detection toolbox and benchmark. arXiv:1906.07155 (2019)

  6. Damen, D., et al.: Scaling egocentric vision: the dataset. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 753–771. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_44

    Chapter  Google Scholar 

  7. Dehban, A., Jamone, L., Kampff, A.R., Santos-Victor, J.: A moderately large size dataset to learn visual affordances of objects and tools using iCub humanoid robot. In: ECCV Workshop on Action and Anticipation for Visual Learning (2016)

    Google Scholar 

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)

    Google Scholar 

  9. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: ICCV (2019)

    Google Scholar 

  10. Dutta, A., Gupta, A., Zissermann, A.: VGG image annotator (VIA). http://www.robots.ox.ac.uk/vgg/software/via/ (2016). version: 2.0.5. Accessed 27 Feb 2019

  11. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)

    Article  Google Scholar 

  12. Girshick, R.: Fast R-CNN. In: ICCV (2015)

    Google Scholar 

  13. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

    Google Scholar 

  15. Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)

    Google Scholar 

  16. Kemp, C.C., Edsinger, A.: Robot manipulation of human tools: autonomous detection and control of task relevant features. In: ICDL (2006)

    Google Scholar 

  17. Li, K., Zhao, X., Bian, J., Tan, M.: Automatic safety helmet wearing detection. arXiv:1802.00264 (2018)

  18. Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)

    Google Scholar 

  19. Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48

    Chapter  Google Scholar 

  20. Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  21. Mar, T., Natale, L., Tikhanoff, V.: A framework for fast, autonomous and reliable tool incorporation on iCub. Front. Robot. AI 5, 98 (2018)

    Article  Google Scholar 

  22. Mar, T., Tikhanoff, V., Metta, G., Natale, L.: Multi-model approach based on 3D functional features for tool affordance learning in robotics. In: Humanoids (2015)

    Google Scholar 

  23. Mar, T., Tikhanoff, V., Natale, L.: What can i do with this tool? Self-supervised learning of tool affordances from their 3-D geometry. TCDS 10(3), 595–610 (2018)

    Google Scholar 

  24. Myers, A., Teo, C.L., Fermüller, C., Aloimonos, Y.: Affordance detection of tool parts from geometric features. In: ICRA (2015)

    Google Scholar 

  25. Nath, N.D., Behzadan, A.H., Paal, S.G.: Deep learning for site safety: real-time detection of personal protective equipment. Autom. Constr. 112, 103085 (2020)

    Article  Google Scholar 

  26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)

    Google Scholar 

  27. Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018)

    Google Scholar 

  28. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)

    Google Scholar 

  29. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)

  30. Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_48

    Chapter  Google Scholar 

  31. Vaskevicius, N., Pathak, K., Ichim, A., Birk, A.: The jacobs robotics approach to object recognition and localization in the context of the ICRA’11 solutions in perception challenge. In: ICRA (2012)

    Google Scholar 

  32. Wu, J., Cai, N., Chen, W., Wang, H., Wang, G.: Automatic detection of hardhats worn by construction personnel: a deep learning approach and benchmark dataset. Autom. Constr. 106, 102894 (2019)

    Article  Google Scholar 

  33. Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)

    Google Scholar 

  34. Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: ICCV (2019)

    Google Scholar 

  35. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: NeurIPS (2019)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) through project called “CIRAK: Compliant robot manipulator support for montage workers in factories” (project no 117E002). The numerical calculations reported in this paper were partially performed at TÜBİTAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). We would like to thank Erfan Khalaji for his contributions on an earlier version of the work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatih Can Kurnaz .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kurnaz, F.C., Hocaog̃lu, B., Yılmaz, M.K., Sülo, İ., Kalkan, S. (2020). ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-66823-5_22

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-66822-8

  • Online ISBN: 978-3-030-66823-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics