Abstract
Visual relation detection (VRD) is proposed to describe an image with visual relation triplets in the form of <subject, predicate, object>. As a further extension of the traditional VRD task, visual relation of interest detection (VROID) is proposed to obtain visual relations of interest, i.e., visual relations are semantically important for expressing the main content of an image. In this paper, we propose a complete interest propagation from part (CIPFP) method for VROID, which exploits semantic parts and propagates interest along part-instance-relation. Specifically, the interest in CIPFP is propagated from parts to part pairs, from parts to instances, from part pairs to instance pairs, from instances to instance pairs, from parts to relation triplets and from instance pairs to relation triplets. We conduct substantial experiments to validate the effectiveness of the CIPFP method and the components in CIPFP.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Abdelkarim S, Achlioptas P, Huang J, Li B, Church K, Elhoseiny M (2020) Long-tail visual relationship recognition with a visiolinguistic hubless loss. arXiv preprint arXiv:2004.00436
Baier S, Ma Y, Tresp V (2017) Improving visual relationship detection using semantic modeling of scene descriptions. In: International semantic web conference. Springer, pp 53–68
Chen L, Zhang H, Xiao J, He X, Pu S, Chang SF (2018) Scene dynamics: Counterfactual critic multi-agent training for scene graph generation. arXiv preprint arXiv:1812.023473
Chen T, Yu W, Chen R, Lin L (2019) Knowledge-embedded routing network for scene graph generation. In: IEEE conference on computer vision and pattern recognition, pp 6163–6171
Chen X, Mottaghi R, Liu X, Fidler S, Urtasun R, Yuille A (2014) Detect what you can: detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1971–1978
Cong W, Wang W, Lee WC (2018) Scene graph generation via conditional random fields. arXiv preprint arXiv:1811.08075
Cui Z, Xu C, Zheng W, Yang J (2018) Context-dependent diffusion network for visual relationship detection. In: Proceedings of the 26th ACM international conference on Multimedia, pp 1475–1482
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Fang, H.S., Cao, J., Tai, Y.W., Lu, C.: Pairwise body-part attention for recognizing human-object interactions. In: European conference on computer vision, pp 51–67 (2018)
Farhadi A, Redmon J (2018) Yolov3: an incremental improvement. computer vision and pattern recognition, cite as
Girshick R (2015) Fast r-cnn. In: IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587
Gkanatsios N, Pitsikalis V, Koutras P, Maragos P (2019) Attention-translation-relation network for scalable scene graph generation. In: Proceedings of the IEEE international conference on computer vision workshops
Han C, Shen F, Liu L, Yang Y, Shen HT (2018) Visual spatial attention network for relationship detection. In: ACM international conference on multimedia, pp 510–518
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He T, Gao L, Song J, Cai J, Li YF (2020) Learning from the scene and borrowing from the rich: tackling the long tail in scene graph generation. arXiv preprint arXiv:2006.07585
Jae Hwang S, Ravi SN, Tao Z, Kim HJ, Collins MD, Singh V (2018) Tensorize, factorize and regularize: robust visual relationship learning. In: IEEE conference on computer vision and pattern recognition, pp 1014–1023
Jung J, Park J (2020) Improving visual relationship detection using linguistic and spatial cues. ETRI J 42(3):399–410
Kirillov A, He K, Girshick R, Rother C, Dollár P (2019) Panoptic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 9404–9413
Li Y, Chen X, Zhu Z, Xie L, Huang G, Du D, Wang X (2019) Attention-guided unified network for panoptic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 7026–7035
Li Y, Ouyang W, Zhou B, Shi J, Zhang C, Wang X (2018) Factorizable net: an efficient subgraph-based framework for scene graph generation. In: European conference on computer vision, pp 335–351
Li YL, Xu L, Liu X, Huang X, Xu Y, Wang S, Fang HS, Ma Z, Chen M, Lu C (2020) Pastanet: toward human activity knowledge engine. In: IEEE conference on computer vision and pattern recognition, pp 382–391
Liang X, Lee L, Xing EP (2017) Deep variation-structured reinforcement learning for visual relationship and attribute detection. In: IEEE conference on computer vision and pattern recognition, pp 848–857
Liang, Y., Bai, Y., Zhang, W., Qian, X., Zhu, L., Mei, T (2019) Vrr-vg: Refocusing visually-relevant relationships. In: Proceedings of the IEEE international conference on computer vision, pp 10403–10412
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Lu, C., Krishna, R., Bernstein, M., Fei-Fei, L (2016) Visual relationship detection with language priors. In: European conference on computer vision. Springer, pp 852–869
Michieli U, Borsato E, Rossi L, Zanuttigh P (2020) Gmnet: graph matching network for large scale part semantic segmentation in the wild. In: European conference on computer vision. Springer, pp 397–414
Morabia K, Arora J, Vijaykumar T (2020) Attention-based joint detection of object and semantic part. arXiv preprint arXiv:2007.02419
Plesse F, Ginsca A, Delezoide B, Prêteux F (2018) Learning prototypes for visual relationship detection. In: International conference on content-based multimedia indexing. IEEE, pp 1–6
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788 (2016)
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: IEEE conference on computer vision and pattern recognition, pp 7263–7271
Ren S, He K, Girshick R, Sun J (2016) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Tajrobehkar M, Tang K, Zhang H, Lim JH (2021) Align r-cnn: a pairwise head network for visual relationship detection. In: IEEE transactions on multimedia
Tang K, Niu Y, Huang J, Shi J, Zhang H (2020) Unbiased scene graph generation from biased training. In: IEEE conference on computer vision and pattern recognition, pp 3716–3725
Tang K, Zhang H, Wu B, Luo W, Liu W (2019) Learning to compose dynamic tree structures for visual contexts. In: IEEE conference on computer vision and pattern recognition, pp 6619–6628
Tian H, Xu N, Liu AA, Zhang Y (2020) Part-aware interactive learning for scene graph generation. In: ACM international conference on multimedia, pp 3155–3163
Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. In: IEEE international conference on computer vision, pp 9627–9636
Wan B, Zhou D, Liu Y, Li R, He X (2019) Pose-aware multi-level feature network for human object interaction detection. In: IEEE international conference on computer vision, pp 9469–9478
Wang P, Shen X, Lin Z, Cohen S, Price B, Yuille AL (2015) Joint object and part segmentation using deep learned potentials. In: IEEE international conference on computer vision, pp 1573–1581
Wang W, Liu R, Wang M, Wang S, Chang X, Chen Y (2020) Memory-based network for scene graph with unbalanced relations. In: ACM international conference on multimedia, pp 2400–2408
Wang W, Wang M, Wang S, Long G, Yao L, Qi G, Chen Y (2020) One-shot learning for long-tail visual relation detection. AAAI Conf Artif Intell 34:12225–12232
Wang W, Wang R, Shan S, Chen X (2020) Sketching image gist: human-mimetic hierarchical scene graph generation. In: European conference on computer vision, pp 222–239
Wen B, Luo J, Liu X, Huang L (2020) Unbiased scene graph generation via rich and fair semantic extraction. arXiv preprint arXiv:2002.00176
Xiong Y, Liao R, Zhao H, Hu R, Bai M, Yumer E, Urtasun R (2019) Upsnet: a unified panoptic segmentation network. In: IEEE conference on computer vision and pattern recognition, pp 8818–8826
Xu D, Zhu Y, Choy CB, Fei-Fei L (2017) Scene graph generation by iterative message passing. In: IEEE conference on computer vision and pattern recognition, pp 5410–5419
Yang J, Lu J, Lee S, Batra D, Parikh D (2018) Graph r-cnn for scene graph generation. In: European conference on computer vision, pp 670–685
Yao Q, Gong X (2018) Exploiting lstm for joint object and semantic part detection. In: Asian conference on computer vision. Springer, pp 498–512
Yin G, Sheng L, Liu B, Yu N, Wang X, Shao J, Change Loy C (2018) Zoom-net: mining deep feature interactions for visual relationship recognition. In: Proceedings of the European conference on computer vision (ECCV), pp 322–338
Yu F, Wang H, Ren T, Tang J, Wu G (2020) Visual relation of interest detection. In: ACM international conference on multimedia, pp 1386–1394
Yu J, Chai Y, Hu Y, Wu Q (2020) Cogtree: cognition tree loss for unbiased scene graph generation. arXiv preprint arXiv:2009.07526
Yu R, Li A, Morariu VI, Davis LS (2017) Visual relationship detection with internal and external linguistic knowledge distillation. In: IEEE international conference on computer vision, pp 1974–1982
Zellers R, Yatskar M, Thomson S, Choi Y (2018) Neural motifs: scene graph parsing with global context. In: IEEE conference on computer vision and pattern recognition, pp 5831–5840
Zhan Y, Yu J, Yu T, Tao D (2019) On exploring undetermined relationships for visual relationship detection. In: IEEE conference on computer vision and pattern recognition, pp 5128–5137
Zhang J, Elhoseiny M, Cohen S, Chang W, Elgammal A (2017) Relationship proposal networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5678–5686
Zhang J, Zhang Y, Wu B, Fan Y, Shen F, Shen HT (2020) Dual resgcn for balanced scene graphgeneration. arXiv preprint arXiv:2011.04234 (2020)
Zhao,Y, Li J, Zhang Y, Tian Y (2019) Multi-class part parsing with joint boundary-semantic awareness. In: IEEE international conference on computer vision, pp 9177–9186
Zheng S, Chen S, Jin Q (2019) Visual relation detection with multi-level attention. In: ACM international conference on multimedia, pp 121–129
Zhou Y, Fan Y (2021) Visual relation of interest detection based on part detection. In: International symposium on artificial intelligence and robotics
Zhu Y, Jiang S, Li X (2017) Visual relationship detection with object spatial distribution. In: IEEE international conference on multimedia and expo. IEEE, pp 379–384
Zhuang B, Liu L, Shen C, Reid I (2017) Towards context-aware interaction recognition for visual relationship detection. In: IEEE international conference on computer vision, pp 589–598
Acknowledgements
This work is supported by National Science Foundation of China (62072232), Natural Science Foundation of Jiangsu Province (BK20191248) and Collaborative Innovation Center of Novel Software Technology and Industrialization.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhou, Y., Yu, F. Complete interest propagation from part for visual relation of interest detection. Int. J. Mach. Learn. & Cyber. 14, 455–465 (2023). https://doi.org/10.1007/s13042-022-01603-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-022-01603-w