Pose attention and object semantic representation-based human-object interaction detection network

Deng, Wei-Mo; Zhang, Hong-Bo; Lei, Qing; Du, Ji-Xiang; Huang, Min

doi:10.1007/s11042-022-13146-x

Pose attention and object semantic representation-based human-object interaction detection network

Published: 29 April 2022

Volume 81, pages 39453–39470, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Wei-Mo Deng¹,
Hong-Bo Zhang ORCID: orcid.org/0000-0001-5536-5224¹,
Qing Lei²,
Ji-Xiang Du³ &
…
Min Huang⁴

296 Accesses
1 Altmetric
Explore all metrics

Abstract

Human-object interaction (HOI) detection is a core problem in human-centric scene understanding, which is devoted to inferring triplets < human, verb, object > between humans and objects. Previous works mainly determine the interaction of each human-object pair by performing joint inference based on multiple features. In this paper, we design more discriminative representation of the human-object pair and a more effective HOI detection model. On the one hand, we use human poses as an attention mechanism to strengthen features, which is a novel way to deal with human poses in HOI detection. On the other hand, for a more effective representation of objects, a word vector is used to encode objects, and the relation features of humans and objects are captured by a graph convolution network based on object word vectors and human appearance features. These relation features are also strengthened by a human pose attention mechanism. Our model yields favorable results compared to the state-of-the-art HOI detection algorithms on two large-scale benchmark datasets, V-COCO and HICO-DET.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detecting human—object interaction with multi-level pairwise feature network

Article Open access 19 October 2020

Human-object interaction detection based on cascade multi-scale transformer

Article 16 February 2024

Polysemy Deciphering Network for Human-Object Interaction Detection

References

Chao YW, Liu Y, Liu X, Zeng H, Deng J (2018) Learning to detect human-object interactions. In: 2018 Ieee winter conference on applications of computer vision (wacv), IEEE, pp 381–389
Chao YW, Wang Z, He Y, Wang J, Deng J (2015) Hico: a benchmark for recognizing human-object interactions in images. In: Proceedings of the IEEE international conference on computer vision, pp 1017–1025
Chowdhary CL, Patel PV, Kathrotia KJ, Attique M, Ijaz MF (2020) Analytical study of hybrid techniques for image encryption and decryption. Sensors 20(18)
Colque RM, Caetano C, de Melo VHC, Chavez GC, Schwartz WR (2018) Novel anomalous event detection based on human-object interactions. In: VISIGRAPP (5: VISAPP), pp 293–300
Fang HS, Cao J, Tai YW, Lu C (2018) Pairwise body-part attention for recognizing human-object interactions. In: Proceedings of the European conference on computer vision (ECCV), pp 51–67
Gao C, Xu J, Zou Y, Huang JB (2020) Drg: Dual relation graph for human-object interaction detection. In: European conference on computer vision, Springer, pp 696–712
Gao C, Zou Y, Huang JB (2018) ican: Instance-centric attention network for human-object interaction detection. arXiv:1808.10437
Girshick R (2015) Fast r-cnn. Computer Science
Gkioxari G, Girshick R, Dollár P, He K (2018) Detecting and recognizing human-object interactions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8359–8367
Gupta S, Malik J (2015) Visual semantic role labeling. arXiv preprint arXiv:1505.04474
Gupta T, Schwing A, Hoiem D (2019) No-frills human-object interaction detection: Factorization, layout encodings, and training techniques. In: Proceedings of the IEEE international conference on computer vision, pp 9677–9685
Hassan M, Dharmaratne A (2015) Labeling abnormalities in video based complex human-object interactions by robust affordance modelling. In: International conference on computer vision & image analysis applications
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: IEEE Conference on computer vision & pattern recognition
Huh JH, Seo YS (2019) Understanding edge computing: Engineering evolution with artificial intelligence. IEEE Access PP(99):1–1
Google Scholar
Johnson J, Krishna R, Stark M, Li LJ, Shamma DA, Bernstein MS, Fei-Fei L (2015) Image retrieval using scene graphs. In: IEEE Conference on computer vision & pattern recognition
Kim DJ, Sun X, Choi J, Lin S, Kweon IS (2020) Detecting human-object interactions with action co-occurrence priors. In: European conference on computer vision, Springer, pp 718–736
Lee P, Yoo JH (2020) Face recognition at a distance for a stand-alone access control system. Sensors 20(3):785
Article Google Scholar
Li YL, Zhou S, Huang X, Xu L, Ma Z, Fang HS, Wang Y, Lu C (2019) Transferable interactiveness knowledge for human-object interaction detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3585–3594
Liang Z, Liu J, Guan Y, Rojas J (2020) Pose-based modular network for human-object interaction detection. arXiv:2008.02042
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision, Springer, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. In: European conference on computer vision, Springer, pp 21–37
Liu Y, Chen Q, Zisserman A (2020) Amplifying key cues for human-object-interaction detection. In: European conference on computer vision, Springer, pp 248–265
Lu J, Yang J, Batra D, Parikh D (2018) Neural baby talk. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7219–7228
Mikolov T, Grave E, Bojanowski P, Puhrsch C, Joulin A (2017) Advances in pre-training distributed word representations. arXiv:1712.09405
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems 26:3111–3119
Google Scholar
Qi S, Wang W, Jia B, Shen J, Zhu SC (2018) Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European conference on computer vision (ECCV), pp 401–417
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Syed MR (2008) Multimedia technologies: Concepts, methodologies, tools, and applications. Media in Foreign Language Instruction 13(2):222–224
Google Scholar
Tamang J, Nkapkop JDD, Ijaz MF, Prasad PK, Tsafack N, Saha A, Kengne J, Son Y (2021) Dynamical properties of ion-acoustic waves in space plasma and its application to image encryption. IEEE Access 9:18762–18782
Article Google Scholar
Ulutan O, Iftekhar A, Manjunath BS (2020) Vsgnet: Spatial attention network for detecting human object interactions using graph convolutions. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 13617–13626
Wan B, Zhou D, Liu Y, Li R, He X (2019) Pose-aware multi-level feature network for human object interaction detection. In: Proceedings of the IEEE international conference on computer vision, pp 9469–9478
Wang H, Zheng WS, Yingbiao L (2020) Contextual heterogeneous graph network for human-object interaction detection. In: European conference on computer vision, Springer, pp 248–264
Wang T, Anwer RM, Khan MH, Khan FS, Pang Y, Shao L, Laaksonen J (2019) Deep contextual attention for human-object interaction detection. In: Proceedings of the IEEE international conference on computer vision, pp 5694–5702
Xiang T, Gong S, Lai J, Zheng W-S, Hu J-F (2016) Exemplar-based recognition of human-object interactions. IEEE Transactions on Circuits & Systems for Video Technology
Xu B, Li J, Wong Y, Zhao Q, Kankanhalli MS (2019) Interact as you intend: Intention-driven human-object interaction detection. IEEE Transactions on Multimedia 22(6):1423–1432
Article Google Scholar
Zhang HB, Zhang YX, Zhong B, Lei Q, Yang L, Du JX, Chen DS (2019) A comprehensive survey of Vision-Based human action recognition methods. Sensors 19(5)
Zhang HB, Zhou YZ, Du JX, Huang JL, Yang L (2020) Improved human-object interaction detection through skeleton-object relations. Journal of Experimental & Theoretical Artificial Intelligence (1), 1–12
Zhou P, Chi M (2019) Relation parsing neural network for human-object interaction detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

Download references

Acknowledgements

The authors would like to thank the anonymous reviewers for their valuable and insightful comments on an earlier version of this manuscript.

Author information

Authors and Affiliations

School of Computer Science and Technology, Huaqiao University, Xiamen, 361000, China
Wei-Mo Deng & Hong-Bo Zhang
Xiamen Key Laboratory of Computer Vision and Pattern Recognition, Huaqiao University, Xiamen, 361000, China
Qing Lei
Fujian Key Laboratory of Big Data Intelligence and Security, Huaqiao University, Xiamen, 361000, China
Ji-Xiang Du
College of Humanities, Xiamen University, Xiamen, 361000, China
Min Huang

Authors

Wei-Mo Deng
View author publications
You can also search for this author in PubMed Google Scholar
Hong-Bo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Lei
View author publications
You can also search for this author in PubMed Google Scholar
Ji-Xiang Du
View author publications
You can also search for this author in PubMed Google Scholar
Min Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hong-Bo Zhang.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work was supported by the Natural Science Foundation of China [No. 61871196, 62001176, 61902330 and 61673186]; National Key Research and Development Program of China [NO.2019YFC1604700]; Natural Science Foundation of Fujian Province of China [No. 2019J01082 and 2020J01085]; and the Promotion Program for Young and Middle-aged Teacher in Science and Technology Research of Huaqiao University [ZQN-YX601].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deng, WM., Zhang, HB., Lei, Q. et al. Pose attention and object semantic representation-based human-object interaction detection network. Multimed Tools Appl 81, 39453–39470 (2022). https://doi.org/10.1007/s11042-022-13146-x

Download citation

Received: 23 February 2021
Revised: 23 April 2021
Accepted: 10 April 2022
Published: 29 April 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s11042-022-13146-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pose attention and object semantic representation-based human-object interaction detection network

Abstract

Access this article

Similar content being viewed by others

Detecting human—object interaction with multi-level pairwise feature network

Human-object interaction detection based on cascade multi-scale transformer

Polysemy Deciphering Network for Human-Object Interaction Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pose attention and object semantic representation-based human-object interaction detection network

Abstract

Access this article

Similar content being viewed by others

Detecting human—object interaction with multi-level pairwise feature network

Human-object interaction detection based on cascade multi-scale transformer

Polysemy Deciphering Network for Human-Object Interaction Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation