Abstract
In the field of object detection, due to the complexity of realistic scenarios, the objects are mostly obscured and semantic-confusable. The existing CNNs-based object detectors focus only on the information within the region proposal and ignore the auxiliary role of objects-objects relationships, leading to difficulties distinguishing difficult samples in complex spaces. Accordingly, in this paper, we propose a novel relation-aware graph reasoning network (RGRN) to adaptively discover and integrate key semantic and spatial relationships in images. Specifically, in order to realize information interaction and relational reasoning between nodes, we design two parallel modules: the semantic relational reasoning module (SRRM) and the spatial relational reasoning module (SPRM). SRRM mines the semantic relationships between objects by discriminating the semantic similarity between graph nodes, and SPRM finds the spatial relationships between objects by the relative positions between nodes. Our method considers the relative spatial location and semantic correlation between objects, which can easily embed in existing networks in real-time to improve performance. Solid experiments verify the effectiveness of our method, which achieves around 16\(\%\) improvement on MS COCO and 10\(\%\) on PASCAL VOC in terms of mAP and outperforms the state-of-the-art relation-based methods, which indicates the superiority and effectiveness of RGRN.









Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availibility statement
The data that support the findings of this study are available from the Chu J upon reasonable request.
References
Girshick R (2015) Fast r-cnn, In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 1440–1448
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards realtime object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell (6):1137–1149
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks, In Proceedings of the IEEE international conference on computer vision, pp 764–773
Zhang Y, Chu J, Leng L, Miao J (2020) Mask-refined r-cnn: a network for refining object details in instance segmentation. Sensors 20(4):1010
Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19–959
Park H-J, Choi Y-J, Lee Y-W, Kim B-G (2022) SSFPN: scale sequence (\(s^{}\)2) feature based feature pyramid network for object detection, arXiv preprint arXiv:2208.11533
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks, In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2874–2883
Ouyang W, Luo P, Zeng X, Qiu S, Tian Y, Li H, Yang S, Wang Z, Xiong Y, Qian C et al (2014) Deepid-net: multi-stage and deformable deep convolutional neural networks for object detection, arXiv preprint arXiv:1409.3505
Chen X, Li L-J, Fei-Fei L, Gupta A (2018) Iterative visual reasoning beyond convolutions, In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 7239–7248
Chen Z, Zhang J, Tao D (2021) Recursive context routing for object detection. Int J Comput Vision 129(1):142–160
Chen S, Li Z, Huang F, Zhang C, Ma H (2020) Improving object detection with relation mining network. In: 2020 IEEE International Conference on Data Mining (ICDM). IEEE, pp 52–61
Cao P, Zhu Z, Wang Z, Zhu Y, Niu Q (2022) Applications of graph convolutional networks in computer vision. Neural Comput Appl 34:1–19
Wang H, Qin K, Zakari RY, Lu G, Yin J (2022) Deep neural network-based relation extraction: an overview. Neural Comput Appl. https://doi.org/10.1007/s00521-021-06667-3
Pise AA, Vadapalli H, Sanders I (2021) Relational reasoning using neural networks: a survey. Intern J Uncertain Fuzziness Knowl-Based Syst 29(Suppl 2):237–258
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2016) Attentive contexts for object detection. IEEE Trans Multimed 19(5):944–954
Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships, In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 6985–6994
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context, In: European conference on computer vision (ECCV). Springer, pp 740–755
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector, In: European conference on computer vision (ECCV). Springer, pp 21–37
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788
Mi L, Chen Z (2020) Hierarchical graph attention network for visual relationship detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 13 886–13 895
Lin X, Zou Q, Xu X (2021) Action-guided attention mining and relation reasoning network for human-object interaction detection. In: Proceedings of the Twenty-Ninth international conference on international joint conferences on artificial intelligence (IJCAI), pp 1104–1110
Zhai Q, Li X, Yang F, Chen C, Cheng H, Fan D-P (2021) Mutual graph learning for camouflaged object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 12 997–13 007
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE international conference on computer vision (ICCV), pp 4086–4096
Li Z, Du X, Cao Y (2020) Gar: graph assisted reasoning for object detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision (WACV), pp 1295–1304
Jang S, Battulga L, Nasridinov A (2020) Detection of dangerous situations using deep learning model with relational inference. J Multimed Inf Syst 7(3):205–214
Liu F, Liu J, Wang W, Lu H (2021) Hair: hierarchical visual-semantic relational reasoning for video question answering. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1698–1707
Santoro A, Raposo D, Barrett DG et al (2017) A simple neural network module for relational reasoning[J]. Adv Neural Inf Process Syst (30):1–0
Scarselli F, Gori M, Tsoi AC, Hagenbuchner M, Monfardini G (2008) The graph neural network model. IEEE Trans Neural Netw 20(1):61–80
Kipf TN, Welling M (2016) Semi-supervised classification with graph convolutional networks, arXiv preprint arXiv:1609.02907
Choi Y-J, Lee Y-W, Kim B-G (2021) Residual-based graph convolutional network for emotion recognition in conversation for smart internet of things. Big Data 9(4):279–288
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. ICLR 1050:1–12
Arnab A, Sun C, Schmid C (2021) Unified graph structured models for video understanding. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 8117–8126
Tian S, Kang L, Xing X, Tian J, Fan C, Zhang Y (2021) A relation-augmented embedded graph attention network for remote sensing object detection. IEEE Trans Geosci Remote Sens 60:1–18
Jain V, Kaliyar RK, Goswami A, Narang P, Sharma Y (2022) Aenet: an attention-enabled neural architecture for fake news detection using contextual features. Neural Comput Appl 34(1):771–782
Li L, Gan Z, Cheng Y, Liu J (2019) Relation-aware graph attention network for visual question answering. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10 313–10 322
Ladicky L, Russell C, Kohli P, Torr PH (2010) Graph cut based inference with co-occurrence statistics. In: Computer Vision–ECCV 2010: 11th European conference on computer vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part V 11. Springer, pp 239–253
Chen Z-M, Wei X-S, Wang P, Guo Y (2019) Multi-label image recognition with graph convolutional networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 5177–5186
Felzenszwalb PF, Girshick RB, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Hu H, Gu J, Zhang Z, Dai J, . Wei Y (2018) Relation networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3588–3597
Xu H, Jiang C, Liang X, Li Z (2019) Spatial-aware graph relation network for large-scale object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 9298–9307
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Zhao G, Ge W, Yu Y (2021) Graphfpn: Graph feature pyramid network for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 2763–2772
Liang T, Wang Y, Tang Z, Hu G, Ling H (2021) Opanas: One-shot path aggregation network architecture search for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10 195–10 203
Liu H, You X, Wang T, Li Y (2022) Object detection via inner-inter relational reasoning network. Image Vis Comput 130:104615
PASZKE A, GROSS S, MASSA F et al (2019) Pytorch: an imperative style, high performance deep learning library[J]. arXiv preprint arXiv:1912.01703
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L, Imagenet: A large-scale hierarchical image database. In: (2009) IEEE conference on computer vision and pattern recognition (CVPR). IEEE 2009:248–255
S. Zhang, L. Wen, X. Bian, Z. Lei, and S. Z. Li, Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 4203–4212
Liu S, Huang D et al (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 385–400
Ma W, Wu Y, Cen F, Wang G (2020) Mdfn: multi-scale deep feature learning network for object detection. Pattern Recogn 100:107149
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector, arXiv preprint arXiv:1701.06659
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In European conference on computer vision (ECCV). Springer, pp 340–353
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant NO. 62162045 and 61866028) and Technology Innovation Guidance Program Project of Jiangxi Province (Special Project of Technology Cooperation) (Grant No. 20212BDH81003).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhao, J., Chu, J., Leng, L. et al. RGRN: Relation-aware graph reasoning network for object detection. Neural Comput & Applic 35, 16671–16688 (2023). https://doi.org/10.1007/s00521-023-08550-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-023-08550-9