Multi-branch Graph Network for Learning Human-Object Interaction

Wu, Tongtong; Zhang, Xu; Duan, Fuqing; Chang, Liang

doi:10.1007/978-3-030-88013-2_35

Tongtong Wu¹⁶,
Xu Zhang¹⁶,
Fuqing Duan¹⁶ &
…
Liang Chang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 13022))

Included in the following conference series:

Chinese Conference on Pattern Recognition and Computer Vision (PRCV)

1766 Accesses
1 Citations

Abstract

In this work, we study the task of detecting human-object interactions (HOI) from images, which is defined as detecting triplets of (human, predicate, object). A common practice in the literatures is firstly localizing human and object instances and then inferring the triplets or predicates only as a classification task from the detected human-object pairs. A data sparsity issue arises when inferring the triplets because of the serious data imbalance among HOI classes, while a data variance issue arises when inferring predicates only since a predicate can carry different semantic meanings when being applied to different objects. To resolve the problem, we propose to decompose HOI classes with a same predicate into several semantic groups based on the appearance, semantic information and function of the objects. By doing this, semantic-related HOI classes are grouped together to compensate the data sparsity issue, while visually and functionally less related HOI classes are separated to relieve the data variance issue. We reveal multiple levels of decomposition in different granularities can provide richer auxiliary information to boost the performance. We implement this idea with a multi-branch graph network, while the multiple branches make classifications based on different levels of decompositions. We evaluate our method on popular HICO-Det dataset. Experimental results show that our method achieves state-of-the art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bansal, A., Rambhatla, S.S., Shrivastava, A., Chellappa, R.: Detecting human-object interactions via functional generalization. arXiv preprint arXiv:1904.03181 (2019)
Chao, Y.W., Liu, Y., Liu, X., Zeng, H., Deng, J.: Learning to detect human-object interactions. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 381–389. IEEE (2018)
Google Scholar
Chao, Y.W., Wang, Z., He, Y., Wang, J., Deng, J.: HICO: a benchmark for recognizing human-object interactions in images. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1017–1025 (2015)
Google Scholar
Fang, H.S., Cao, J., Tai, Y.W., Lu, C.: Pairwise body-part attention for recognizing human-object interactions. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 51–67 (2018)
Google Scholar
Gao, C., Zou, Y., Huang, J.B.: iCAN: instance-centric attention network for human-object interaction detection. arXiv preprint arXiv:1808.10437 (2018)
Gkioxari, G., Girshick, R., Dollár, P., He, K.: Detecting and recognizing human-object interactions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8359–8367 (2018)
Google Scholar
Gupta, T., Schwing, A., Hoiem, D.: No-frills human-object interaction detection: factorization, layout encodings, and training techniques. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9677–9685 (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Krishna, R., et al.: Visual genome: connecting language and vision using crowdsourced dense image annotations. Int. J. Comput. Vis. 123(1), 32–73 (2017). https://doi.org/10.1007/s11263-016-0981-7
Article MathSciNet Google Scholar
Li, Y.L., et al.: Transferable interactiveness knowledge for human-object interaction detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3585–3594 (2019)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Peyre, J., Laptev, I., Schmid, C., Sivic, J.: Detecting unseen visual relations using analogies. arXiv preprint arXiv:1812.05736 (2018)
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.C.: Learning human-object interactions by graph parsing neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 401–417 (2018)
Google Scholar
Řehůřek, R., Sojka, P.: Software framework for topic modelling with large corpora. In: Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50. ELRA, Valletta, May 2010. http://is.muni.cz/publication/884893/en
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Shen, L., Yeung, S., Hoffman, J., Mori, G., Fei-Fei, L.: Scaling human-object interaction recognition through zero-shot learning. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1568–1576. IEEE (2018)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Wang, S., Cheng, Z., Deng, X., Chang, L., Duan, F., Lu, K.: Leveraging 3D blendshape for facial expression recognition using CNN. Sci. China Inf. Sci 63 (2020). Article number: 120114. https://doi.org/10.1007/s11432-019-2747-y
Xu, B., Li, J., Wong, Y., Zhao, Q., Kankanhalli, M.S.: Interact as you intend: Intention-driven human-object interaction detection. IEEE Trans. Multimed. 22(6), 1423–1432 (2019)
Article Google Scholar
Xu, B., Wong, Y., Li, J., Zhao, Q., Kankanhalli, M.S.: Learning to detect human-object interactions with knowledge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2019)
Google Scholar
Yang, J., Lu, J., Lee, S., Batra, D., Parikh, D.: Graph R-CNN for scene graph generation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 670–685 (2018)
Google Scholar
Yang, Z., Qin, Z., Yu, J., Hu, Y.: Scene graph reasoning with prior visual relationship for visual question answering. arXiv preprint arXiv:1812.09681 (2018)
Zhuang, B., Wu, Q., Shen, C., Reid, I., van den Hengel, A.: HCVRD: a benchmark for large-scale human-centered visual relationship detection. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar

Download references

Acknowledgement

This work is supported by National Key Research and Development Project Grant, Grant/Award Number: 2018AAA0100802.

Author information

Authors and Affiliations

College of Artifical Intelligence, Beijing Normal University, 19 Xinjiekouwai Street, Haidian, Beijing, 100875, People’s Republic of China
Tongtong Wu, Xu Zhang, Fuqing Duan & Liang Chang

Authors

Tongtong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Fuqing Duan
View author publications
You can also search for this author in PubMed Google Scholar
Liang Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fuqing Duan .

Editor information

Editors and Affiliations

University of Science and Technology Beijing, Beijing, China
Huimin Ma
Chinese Academy of Sciences, Beijing, China
Liang Wang
Tsinghua University, Beijing, China
Changshui Zhang
Zhejiang University, Hangzhou, China
Fei Wu
Chinese Academy of Sciences, Beijing, China
Tieniu Tan
Hunan University, Changsha, China
Yaonan Wang
Sun Yat-Sen University, Guangzhou, Guangdong, China
Jianhuang Lai
Beijing Jiaotong University, Beijing, China
Yao Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, T., Zhang, X., Duan, F., Chang, L. (2021). Multi-branch Graph Network for Learning Human-Object Interaction. In: Ma, H., et al. Pattern Recognition and Computer Vision. PRCV 2021. Lecture Notes in Computer Science(), vol 13022. Springer, Cham. https://doi.org/10.1007/978-3-030-88013-2_35

Download citation

DOI: https://doi.org/10.1007/978-3-030-88013-2_35
Published: 22 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-88012-5
Online ISBN: 978-3-030-88013-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics