ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection

Kurnaz, Fatih Can; Hocaog̃lu, Burak; Yılmaz, Mert Kaan; Sülo, İdil; Kalkan, Sinan

doi:10.1007/978-3-030-66823-5_22

Fatih Can Kurnaz¹⁰,
Burak Hocaog̃lu¹⁰,
Mert Kaan Yılmaz¹⁰,
İdil Sülo¹⁰ &
…
Sinan Kalkan¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12538))

Included in the following conference series:

European Conference on Computer Vision

2673 Accesses
1 Citations

Abstract

Robots collaborating with humans in realistic environments need to be able to detect the tools that can be used and manipulated. However, there is no available dataset or study that addresses this challenge in real settings. In this paper, we fill this gap with a dataset for detecting farming, gardening, office, stonemasonry, vehicle, woodworking, and workshop tools. The scenes in our dataset are snapshots of sophisticated environments with or without humans using the tools. The scenes we consider introduce several challenges for object detection, including the small scale of the tools, their articulated nature, occlusion, inter-class invariance, etc. Moreover, we train and compare several state of the art deep object detectors (including Faster R-CNN, Cascade R-CNN, YOLOv3, RetinaNet, RepPoint, and FreeAnchor) on our dataset. We observe that the detectors have difficulty in detecting especially small-scale tools or tools that are visually similar to parts of other tools. In addition, we provide a novel, practical safety use case with a deep network which checks whether the human worker is wearing the safety helmet, mask, glass, and glove tools. With the dataset, the code and the trained models, our work provides a basis for further research into tools and their use in robotics applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
It is better to call some of these objects as equipment. However, since they provide similar functionalities (being used by a human or a robot while performing a task), we will just use the term tool to refer to all such objects, for the sake of simplicity.

References

Abelha, P., Guerin, F.: Learning how a tool affords by simulating 3D models from the web. In: IROS (2017)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. arxiv:1712.00726 (2017)
Calli, B., Walsman, A., Singh, A., Srinivasa, S., Abbeel, P., Dollar, A.M.: Benchmarking in manipulation research: the YCB object and model set and benchmarking protocols. arXiv:1502.03143 (2015)
Cao, Z., Hidalgo, G., Simon, T., Wei, S.E., Sheikh, Y.: OpenPose: realtime multi-person 2D pose estimation using part affinity fields. arXiv:1812.08008 (2018)
Chen, K., Wang, J., Pang, J.E.A.: MMDetection: open MMLab detection toolbox and benchmark. arXiv:1906.07155 (2019)
Damen, D., et al.: Scaling egocentric vision: the dataset. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 753–771. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_44
Chapter Google Scholar
Dehban, A., Jamone, L., Kampff, A.R., Santos-Victor, J.: A moderately large size dataset to learn visual affordances of objects and tools using iCub humanoid robot. In: ECCV Workshop on Action and Anticipation for Visual Learning (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: ICCV (2019)
Google Scholar
Dutta, A., Gupta, A., Zissermann, A.: VGG image annotator (VIA). http://www.robots.ox.ac.uk/vgg/software/via/ (2016). version: 2.0.5. Accessed 27 Feb 2019
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. IJCV 88(2), 303–338 (2010)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hinterstoisser, S., et al.: Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes. In: ACCV (2012)
Google Scholar
Kemp, C.C., Edsinger, A.: Robot manipulation of human tools: autonomous detection and control of task relevant features. In: ICDL (2006)
Google Scholar
Li, K., Zhao, X., Bian, J., Tan, M.: Automatic safety helmet wearing detection. arXiv:1802.00264 (2018)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Mar, T., Natale, L., Tikhanoff, V.: A framework for fast, autonomous and reliable tool incorporation on iCub. Front. Robot. AI 5, 98 (2018)
Article Google Scholar
Mar, T., Tikhanoff, V., Metta, G., Natale, L.: Multi-model approach based on 3D functional features for tool affordance learning in robotics. In: Humanoids (2015)
Google Scholar
Mar, T., Tikhanoff, V., Natale, L.: What can i do with this tool? Self-supervised learning of tool affordances from their 3-D geometry. TCDS 10(3), 595–610 (2018)
Google Scholar
Myers, A., Teo, C.L., Fermüller, C., Aloimonos, Y.: Affordance detection of tool parts from geometric features. In: ICRA (2015)
Google Scholar
Nath, N.D., Behzadan, A.H., Paal, S.G.: Deep learning for site safety: real-time detection of personal protective equipment. Autom. Constr. 112, 103085 (2020)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Redmon, J., Farhadi, A.: Yolov3: an incremental improvement. arXiv (2018)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556 (2014)
Sun, M., Bradski, G., Xu, B.-X., Savarese, S.: Depth-encoded hough voting for joint object detection and shape recovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6315, pp. 658–671. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15555-0_48
Chapter Google Scholar
Vaskevicius, N., Pathak, K., Ichim, A., Birk, A.: The jacobs robotics approach to object recognition and localization in the context of the ICRA’11 solutions in perception challenge. In: ICRA (2012)
Google Scholar
Wu, J., Cai, N., Chen, W., Wang, H., Wang, G.: Automatic detection of hardhats worn by construction personnel: a deep learning approach and benchmark dataset. Autom. Constr. 106, 102894 (2019)
Article Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Google Scholar
Yang, Z., Liu, S., Hu, H., Wang, L., Lin, S.: Reppoints: point set representation for object detection. In: ICCV (2019)
Google Scholar
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: FreeAnchor: learning to match anchors for visual object detection. In: NeurIPS (2019)
Google Scholar

Download references

Acknowledgment

This work was supported by the Scientific and Technological Research Council of Turkey (TÜBİTAK) through project called “CIRAK: Compliant robot manipulator support for montage workers in factories” (project no 117E002). The numerical calculations reported in this paper were partially performed at TÜBİTAK ULAKBIM, High Performance and Grid Computing Center (TRUBA resources). We would like to thank Erfan Khalaji for his contributions on an earlier version of the work.

Author information

Authors and Affiliations

Department of Computer Engineering, KOVAN Research Lab, Middle East Technical University, Ankara, Turkey
Fatih Can Kurnaz, Burak Hocaog̃lu, Mert Kaan Yılmaz, İdil Sülo & Sinan Kalkan

Authors

Fatih Can Kurnaz
View author publications
You can also search for this author in PubMed Google Scholar
Burak Hocaog̃lu
View author publications
You can also search for this author in PubMed Google Scholar
Mert Kaan Yılmaz
View author publications
You can also search for this author in PubMed Google Scholar
İdil Sülo
View author publications
You can also search for this author in PubMed Google Scholar
Sinan Kalkan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatih Can Kurnaz .

Editor information

Editors and Affiliations

University of Clermont Auvergne, Clermont Ferrand, France
Adrien Bartoli
Università degli Studi di Udine, Udine, Italy
Andrea Fusiello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kurnaz, F.C., Hocaog̃lu, B., Yılmaz, M.K., Sülo, İ., Kalkan, S. (2020). ALET (Automated Labeling of Equipment and Tools): A Dataset for Tool Detection and Human Worker Safety Detection. In: Bartoli, A., Fusiello, A. (eds) Computer Vision – ECCV 2020 Workshops. ECCV 2020. Lecture Notes in Computer Science(), vol 12538. Springer, Cham. https://doi.org/10.1007/978-3-030-66823-5_22

Download citation

DOI: https://doi.org/10.1007/978-3-030-66823-5_22
Published: 03 January 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-66822-8
Online ISBN: 978-3-030-66823-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics