Skip to main content
Log in

Complete interest propagation from part for visual relation of interest detection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Visual relation detection (VRD) is proposed to describe an image with visual relation triplets in the form of <subject, predicate, object>. As a further extension of the traditional VRD task, visual relation of interest detection (VROID) is proposed to obtain visual relations of interest, i.e., visual relations are semantically important for expressing the main content of an image. In this paper, we propose a complete interest propagation from part (CIPFP) method for VROID, which exploits semantic parts and propagates interest along part-instance-relation. Specifically, the interest in CIPFP is propagated from parts to part pairs, from parts to instances, from part pairs to instance pairs, from instances to instance pairs, from parts to relation triplets and from instance pairs to relation triplets. We conduct substantial experiments to validate the effectiveness of the CIPFP method and the components in CIPFP.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abdelkarim S, Achlioptas P, Huang J, Li B, Church K, Elhoseiny M (2020) Long-tail visual relationship recognition with a visiolinguistic hubless loss. arXiv preprint arXiv:2004.00436

  2. Baier S, Ma Y, Tresp V (2017) Improving visual relationship detection using semantic modeling of scene descriptions. In: International semantic web conference. Springer, pp 53–68

  3. Chen L, Zhang H, Xiao J, He X, Pu S, Chang SF (2018) Scene dynamics: Counterfactual critic multi-agent training for scene graph generation. arXiv preprint arXiv:1812.023473

  4. Chen T, Yu W, Chen R, Lin L (2019) Knowledge-embedded routing network for scene graph generation. In: IEEE conference on computer vision and pattern recognition, pp 6163–6171

  5. Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1971–1978

  6. Cong W, Wang W, Lee WC (2018) Scene graph generation via conditional random fields. arXiv preprint arXiv:1811.08075

  7. Cui Z, Xu C, Zheng W, Yang J (2018) Context-dependent diffusion network for visual relationship detection. In: Proceedings of the 26th ACM international conference on Multimedia, pp 1475–1482

  8. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  9. Fang, H.S., Cao, J., Tai, Y.W., Lu, C.: Pairwise body-part attention for recognizing human-object interactions. In: European conference on computer vision, pp 51–67 (2018)

  10. Farhadi A, Redmon J (2018) Yolov3: an incremental improvement. computer vision and pattern recognition, cite as

  11. Girshick R (2015) Fast r-cnn. In: IEEE international conference on computer vision, pp 1440–1448

  12. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587

  13. Gkanatsios N, Pitsikalis V, Koutras P, Maragos P (2019) Attention-translation-relation network for scalable scene graph generation. In: Proceedings of the IEEE international conference on computer vision workshops

  14. Han C, Shen F, Liu L, Yang Y, Shen HT (2018) Visual spatial attention network for relationship detection. In: ACM international conference on multimedia, pp 510–518

  15. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  16. He T, Gao L, Song J, Cai J, Li YF (2020) Learning from the scene and borrowing from the rich: tackling the long tail in scene graph generation. arXiv preprint arXiv:2006.07585

  17. Jae Hwang S, Ravi SN, Tao Z, Kim HJ, Collins MD, Singh V (2018) Tensorize, factorize and regularize: robust visual relationship learning. In: IEEE conference on computer vision and pattern recognition, pp 1014–1023

  18. Jung J, Park J (2020) Improving visual relationship detection using linguistic and spatial cues. ETRI J 42(3):399–410

    Article  Google Scholar 

  19. Kirillov A, He K, Girshick R, Rother C, Dollár P (2019) Panoptic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 9404–9413

  20. Li Y, Chen X, Zhu Z, Xie L, Huang G, Du D, Wang X (2019) Attention-guided unified network for panoptic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 7026–7035

  21. Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X (2018) Factorizable net: an efficient subgraph-based framework for scene graph generation. In: European conference on computer vision, pp 335–351

  22. Li YL, Xu L, Liu X, Huang X, Xu Y, Wang S, Fang HS, Ma Z, Chen M, Lu C (2020) Pastanet: toward human activity knowledge engine. In: IEEE conference on computer vision and pattern recognition, pp 382–391

  23. Liang X, Lee L, Xing EP (2017) Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: IEEE conference on computer vision and pattern recognition, pp 848–857

  24. Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T (2019) Vrr-vg: Refocusing visually-relevant relationships. In: Proceedings of the IEEE international conference on computer vision, pp 10403–10412

  25. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  26. Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869

  27. Michieli U, Borsato E, Rossi L, Zanuttigh P (2020) Gmnet: graph matching network for large scale part semantic segmentation in the wild. In: European conference on computer vision. Springer, pp 397–414

  28. Morabia K, Arora J, Vijaykumar T (2020) Attention-based joint detection of object and semantic part. arXiv preprint arXiv:2007.02419

  29. Plesse F, Ginsca A, Delezoide B, Prêteux F (2018) Learning prototypes for visual relationship detection. In: International conference on content-based multimedia indexing. IEEE, pp 1–6

  30. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788 (2016)

  31. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition, pp 7263–7271

  32. Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  33. Tajrobehkar M, Tang K, Zhang H, Lim JH (2021) Align r-cnn: a pairwise head network for visual relationship detection. In: IEEE transactions on multimedia

  34. Tang K, Niu Y, Huang J, Shi J, Zhang H (2020) Unbiased scene graph generation from biased training. In: IEEE conference on computer vision and pattern recognition, pp 3716–3725

  35. Tang K, Zhang H, Wu B, Luo W, Liu W (2019) Learning to compose dynamic tree structures for visual contexts. In: IEEE conference on computer vision and pattern recognition, pp 6619–6628

  36. Tian H, Xu N, Liu AA, Zhang Y (2020) Part-aware interactive learning for scene graph generation. In: ACM international conference on multimedia, pp 3155–3163

  37. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: IEEE international conference on computer vision, pp 9627–9636

  38. Wan B, Zhou D, Liu Y, Li R, He X (2019) Pose-aware multi-level feature network for human object interaction detection. In: IEEE international conference on computer vision, pp 9469–9478

  39. Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Joint object and part segmentation using deep learned potentials. In: IEEE international conference on computer vision, pp 1573–1581

  40. Wang W, Liu R, Wang M, Wang S, Chang X, Chen Y (2020) Memory-based network for scene graph with unbalanced relations. In: ACM international conference on multimedia, pp 2400–2408

  41. Wang W, Wang M, Wang S, Long G, Yao L, Qi G, Chen Y (2020) One-shot learning for long-tail visual relation detection. AAAI Conf Artif Intell 34:12225–12232

    Google Scholar 

  42. Wang W, Wang R, Shan S, Chen X (2020) Sketching image gist: human-mimetic hierarchical scene graph generation. In: European conference on computer vision, pp 222–239

  43. Wen B, Luo J, Liu X, Huang L (2020) Unbiased scene graph generation via rich and fair semantic extraction. arXiv preprint arXiv:2002.00176

  44. Xiong Y, Liao R, Zhao H, Hu R, Bai M, Yumer E, Urtasun R (2019) Upsnet: a unified panoptic segmentation network. In: IEEE conference on computer vision and pattern recognition, pp 8818–8826

  45. Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: IEEE conference on computer vision and pattern recognition, pp 5410–5419

  46. Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: European conference on computer vision, pp 670–685

  47. Yao Q, Gong X (2018) Exploiting lstm for joint object and semantic part detection. In: Asian conference on computer vision. Springer, pp 498–512

  48. Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J, Change Loy C (2018) Zoom-net: mining deep feature interactions for visual relationship recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 322–338

  49. Yu F, Wang H, Ren T, Tang J, Wu G (2020) Visual relation of interest detection. In: ACM international conference on multimedia, pp 1386–1394

  50. Yu J, Chai Y, Hu Y, Wu Q (2020) Cogtree: cognition tree loss for unbiased scene graph generation. arXiv preprint arXiv:2009.07526

  51. Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: IEEE international conference on computer vision, pp 1974–1982

  52. Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: IEEE conference on computer vision and pattern recognition, pp 5831–5840

  53. Zhan Y, Yu J, Yu T, Tao D (2019) On exploring undetermined relationships for visual relationship detection. In: IEEE conference on computer vision and pattern recognition, pp 5128–5137

  54. Zhang J, Elhoseiny M, Cohen S, Chang W, Elgammal A (2017) Relationship proposal networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5678–5686

  55. Zhang J, Zhang Y, Wu B, Fan Y, Shen F, Shen HT (2020) Dual resgcn for balanced scene graphgeneration. arXiv preprint arXiv:2011.04234 (2020)

  56. Zhao,Y, Li J, Zhang Y, Tian Y (2019) Multi-class part parsing with joint boundary-semantic awareness. In: IEEE international conference on computer vision, pp 9177–9186

  57. Zheng S, Chen S, Jin Q (2019) Visual relation detection with multi-level attention. In: ACM international conference on multimedia, pp 121–129

  58. Zhou Y, Fan Y (2021) Visual relation of interest detection based on part detection. In: International symposium on artificial intelligence and robotics

  59. Zhu Y, Jiang S, Li X (2017) Visual relationship detection with object spatial distribution. In: IEEE international conference on multimedia and expo. IEEE, pp 379–384

  60. Zhuang B, Liu L, Shen C, Reid I (2017) Towards context-aware interaction recognition for visual relationship detection. In: IEEE international conference on computer vision, pp 589–598

Download references

Acknowledgements

This work is supported by National Science Foundation of China (62072232), Natural Science Foundation of Jiangsu Province (BK20191248) and Collaborative Innovation Center of Novel Software Technology and Industrialization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fan Yu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, Y., Yu, F. Complete interest propagation from part for visual relation of interest detection. Int. J. Mach. Learn. & Cyber. 14, 455–465 (2023). https://doi.org/10.1007/s13042-022-01603-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01603-w

Keywords

Navigation