Egocentric Human-Object Interaction Detection Exploiting Synthetic Data

Leonardi, Rosario; Ragusa, Francesco; Furnari, Antonino; Farinella, Giovanni Maria

doi:10.1007/978-3-031-06430-2_20

Rosario Leonardi¹²,
Francesco Ragusa^12,13,
Antonino Furnari^12,13 &
…
Giovanni Maria Farinella^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13232))

Included in the following conference series:

International Conference on Image Analysis and Processing

2142 Accesses

Abstract

We consider the problem of detecting Egocentric Human-Object Interactions (EHOIs) in industrial contexts. Since collecting and labeling large amounts of real images is challenging, we propose a pipeline and a tool to generate photo-realistic synthetic First Person Vision (FPV) images automatically labeled for EHOI detection in a specific industrial scenario. To tackle the problem of EHOI detection, we propose a method that detects the hands, the objects in the scene, and determines which objects are currently involved in an interaction. We compare the performance of our method with a set of state-of-the-art baselines. Results show that using a synthetic dataset improves the performance of an EHOI detection system, especially when few real data are available. To encourage research on this topic, we publicly release the proposed dataset at the following url: https://iplab.dmi.unict.it/EHOI_SYNTH/.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

Object Detection for Functional Assessment Applications

Mixing Domains for Smartly Picking and Using Limited Datasets in Industrial Object Detection

Notes

1.
Ego4D Website: https://ego4d-data.org/.
2.
See supplementary material for more details.
3.
https://www.artec3d.com/portable-3d-scanners/artec-eva-v2.
4.
https://matterport.com/.
5.
We used the following implementation: https://github.com/cocodataset/cocoapi.
6.
YOLOv5: https://github.com/ultralytics/yolov5.

References

Bambach, S., Lee, S., Crandall, D.J., Yu, C.: Lending a hand: detecting hands and recognizing activities in complex egocentric interactions. Int. Conf. Comput. Vis. (2015)
Google Scholar
Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circuits Syst. Video Technol. (2015)
Google Scholar
Chao, Y.W., Liu, Y., Liu, X., Zeng, H., Deng, J.: Learning to detect human-object interactions. Winter Conf. Appl. Comput. Vis. (2018)
Google Scholar
Chen, Y., Huang, S., Yuan, T., Qi, S., Zhu, Y., Zhu, S.C.: Holistic++ scene understanding: single-view 3d holistic scene parsing and human pose estimation with human-object interaction and physical commonsense. Int. Conf. Comput. Vis. (2019)
Google Scholar
Cucchiara, R., Del Bimbo, A.: Visions for augmented cultural heritage experience. IEEE Multim. (2014)
Google Scholar
Damen, D., et al.: Rescaling egocentric vision: collection, pipeline and challenges for epic-kitchens-100. Int. J. Comput. Vis. (2021)
Google Scholar
Damen, D., et al.: Scaling egocentric vision: the epic-kitchens dataset. Eur. Conf. Comput. Vis. (2018)
Google Scholar
Fu, Q., Liu, X., Kitani, K.M.: Sequential voting with relational box fields for active object detection. arXiv preprint arXiv:2110.11524 (2021)
Furnari, A., Farinella, G.M., Battiato, S.: Temporal segmentation of egocentric videos to highlight personal locations of interest. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9913, pp. 474–489. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46604-0_34
Furnari, A., Battiato, S., Grauman, K., Farinella, G.M.: Next-active-object prediction from egocentric videos. J. Vis. Commun. Image Represent. (2017)
Google Scholar
Furnari, A., Farinella, G.M.: Rolling-unrolling LSTMS for action anticipation from first-person video. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
Google Scholar
Garcia-Hernando, G., Yuan, S., Baek, S., Kim, T.K.: First-person hand action benchmark with RGB-D videos and 3D hand pose annotations. Conf. Comput. Vis. Pattern Recogn. (2018)
Google Scholar
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. Conf. Comput. Vis. Pattern Recogn. (2018)
Google Scholar
Gupta, S., Malik, J.: Visual semantic role labeling. arXiv preprint arXiv:1505.04474 (2015)
Hasson, Y., et al.: Learning joint reconstruction of hands and manipulated objects. Conf. Comput. Vis. Pattern Recogn. (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. Conf. Comput. Vis. Pattern Recogn. (2016)
Google Scholar
Heilbron, F.C., Escorcia, V., Ghanem, B., Niebles, J.C.: Activitynet: a large-scale video benchmark for human activity understanding. Conf. Comput. Vis. Pattern Recogn. (2015)
Google Scholar
Kay, W., et al.: The kinetics human action video dataset. arXiv preprint arXiv:1705.06950 (2017)
Li, Y., Liu, M., Rehg, J.M.: In the eye of the beholder: Gaze and actions in first person video. IEEE Trans. Pattern Anal. Mach. Intell.(2021)
Google Scholar
Li, Y., Ye, Z., Rehg, J.M.: Delving into egocentric actions. Conf. Comput. Vis. Pattern Recogn. (2015)
Google Scholar
Liao, Y., Liu, S., Wang, F., Chen, Y., Feng, J.: PPDM: parallel point detection and matching for real-time human-object interaction detection. Conf. Comput. Vis. Pattern Recogn. (2020)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. Conf. Comput. Vis. Pattern Recogn. (2017)
Google Scholar
Lin, T.Y., et al.: Microsoft coco: common objects in context. Eur. Conf. Comput. Vis. (2014)
Google Scholar
Lu, Y., Mayol-Cuevas, W.: The object at hand: Automated editing for mixed reality video guidance from hand-object interactions. Int. Symp. Mixed Augment. Real. (2021)
Google Scholar
Lu, Y., Mayol-Cuevas, W.W.: Understanding egocentric hand-object interactions from hand pose estimation. arXiv preprint arXiv:2109.14657 (2021)
Mueller, F., et al.: Ganerated hands for real-time 3D hand tracking from monocular RGB. Conf. Comput. Vis. Pattern Recogn. (2018)
Google Scholar
Mueller, F., Mehta, D., Sotnychenko, O., Sridhar, S., Casas, D., Theobalt, C.: Real-time hand tracking under occlusion from an egocentric RGB-D sensor. Int. Conf. Comput. Vis. (2017)
Google Scholar
Ragusa, F., Furnari, A., Battiato, S., Signorello, G., Farinella, G.: Egocentric visitors localization in cultural sites. J. Comput. Cult. Herit. (2019)
Google Scholar
Ragusa, F., Furnari, A., Livatino, S., Farinella, G.M.: The meccano dataset: understanding human-object interactions from egocentric videos in an industrial-like domain. Winter Conf. Appl. Comput. Vis. (2021)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. (2015)
Google Scholar
Shan, D., Geng, J., Shu, M., Fouhey, D.F.: Understanding human hands in contact at internet scale. Conf. Comput. Vis. Pattern Recogn. (2020)
Google Scholar
Wang, H., et al.: Learning a generative model for multi-step human-object interactions from videos. Comput. Graph. Forum (2019)
Google Scholar
Zhang, F.Z., Campbell, D., Gould, S.: Efficient two-stage detection of human-object interactions with a novel unary-pairwise transformer. arXiv preprint arXiv:2112.01838 (2021)

Download references

Acknowledgements

This research has been supported by Next Vision (https://www.nextvisionlab.it/) s.r.l., by the project MISE - PON I&C 2014–2020 - Progetto ENIGMA - Prog n. F/190050/02/X44 - CUP: B61B19000520008, and by Research Program Pia.ce.ri. 2020/2022 Linea 2 - University of Catania.

Author information

Authors and Affiliations

FPV@IPLAB, DMI - University of Catania, Catania, Italy
Rosario Leonardi, Francesco Ragusa, Antonino Furnari & Giovanni Maria Farinella
Next Vision s.r.l. - Spinoff of the University of Catania, Catania, Italy
Francesco Ragusa, Antonino Furnari & Giovanni Maria Farinella

Authors

Rosario Leonardi
View author publications
You can also search for this author in PubMed Google Scholar
Francesco Ragusa
View author publications
You can also search for this author in PubMed Google Scholar
Antonino Furnari
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Maria Farinella
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rosario Leonardi .

Editor information

Editors and Affiliations

Boston University, Boston, MA, USA
Stan Sclaroff
National Research Council, Lecce, Italy
Cosimo Distante
National Research Council, Lecce, Italy
Marco Leo
University of Catania, Catania, Italy
Giovanni M. Farinella
Technische Universität München, Garching, Germany
Federico Tombari

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 11758 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leonardi, R., Ragusa, F., Furnari, A., Farinella, G.M. (2022). Egocentric Human-Object Interaction Detection Exploiting Synthetic Data. In: Sclaroff, S., Distante, C., Leo, M., Farinella, G.M., Tombari, F. (eds) Image Analysis and Processing – ICIAP 2022. ICIAP 2022. Lecture Notes in Computer Science, vol 13232. Springer, Cham. https://doi.org/10.1007/978-3-031-06430-2_20

Download citation

DOI: https://doi.org/10.1007/978-3-031-06430-2_20
Published: 17 May 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-06429-6
Online ISBN: 978-3-031-06430-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics