Abstract
Utilizing relational representations to facilitate object detection has attracted growing research attention in recent years. However, previous studies mainly focus on relationships within the region proposals or within the label embeddings and pay less attention to the relationships between them. To fill this gap, we propose a novel object detection framework that fully explores the relationships across visual feature space and label embedding space to facilitate the proposal classification in object detection. Specifically, we model the region proposals and class labels into a uniform relation graph, where the extracted proposals and labels are regarded as nodes and each pair of them is associated by an assignment edge, and convert the problem of classifying proposals to the problem of selecting reliable edges from the constructed relation graph. Furthermore, a graph convolutional module is developed to perform relational reasoning on the graph, which finally predicts a label for each assignment edge to indicate whether the classification is reliable or not. The updated relational representations for proposals are used for bounding box regression. Embedding our framework into state-of-the-art baselines, we perform extensive comparison experiments on two public benchmarks, i.e., Pascal VOC and COCO2017. And the experimental results demonstrate the flexibility and effectiveness of the proposed framework.






Similar content being viewed by others
References
Battaglia, PW., Hamrick, JB., Bapst, V., Sanchez-Gonzalez, A., Zambaldi, VF., Malinowski, M., Tacchetti, A., Raposo, D., Santoro, A., Faulkner, R., Gülçehre, Ç., Song, HF., Ballard, AJ., Gilmer, J., Dahl, GE., Vaswani, A., Allen, KR., Nash, C., Langston, V., Dyer, C., Heess, N., Wierstra, D., Kohli, P., Botvinick, M., Vinyals, O., Li, Y., Pascanu, R.: (2018) Relational inductive biases, deep learning, and graph networks. CoRR arXiv:806.01261
Cai, Z., Vasconcelos, N.: (2018) Cascade R-CNN: delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162
Cao, J., Chen, Q., Guo, J., Shi, R.: (2020) Attention-guided context feature pyramid network for object detection. CoRR abs/2005.11475
Chen, X., Li, L., Fei-Fei, L., Gupta, A.: (2018) Iterative visual reasoning beyond convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 7239–7248
Chen, Z., Wei, X., Wang, P., Guo, Y.: (2019) Multi-label image recognition with graph convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 5177–5186
Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103–111
Cucchiara, R., Grana, C., Piccardi, M., Prati, A.: (2000) Statistic and knowledge-based moving object detection in traffic scenes. In: ITSC2000. 2000 IEEE Intelligent Transportation Systems. Proceedings (Cat. No. 00TH8493), pp 27–32
Dai, J., Li, Y., He, K., Sun, J.: (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural Information Processing Systems, pp 379–387
Dai, J., Qi, H., Xiong, Y., Li, Y., Zhang, G., Hu, H., Wei, Y.: (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp 764–773
Ding, S., Qu, S., Xi, Y., Wan, S.: Stimulus-driven and concept-driven analysis for image caption generation. Neurocomputing 398, 520–530 (2020a)
Ding, X., Li, Q., Cheng, Y., Wang, J., Bian, W., Jie, B.: Local keypoint-based faster R-CNN. Appl Intell 50(10), 3007–3022 (2020b)
Du, X., Shi, X., Huang, R.: (2019) Repgn: Object detection with relational proposal graph network. CoRR abs/1904.08959
Everingham, M., Gool, L.V., Williams, C.K.I., Winn, J.M., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2), 303–338 (2010)
Feng, D., Haase-Schütz, C., Rosenbaum, L., Hertlein, H., Gläser, C., Timm, F., Wiesbeck, W., Dietmayer, K.: Deep multi-modal object detection and semantic segmentation for autonomous driving: Datasets, methods, and challenges. IEEE Trans Intell Transp Syst 22(3), 1341–1360 (2021)
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, AC.: (2017) DSSD : Deconvolutional single shot detector. CoRR arXiv:1701.06659
Girshick, RB.: (2015) Fast R-CNN. In: IEEE International Conference on Computer Vision, pp 1440–1448
Girshick, RB., Donahue, J., Darrell, T., Malik, J.: (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Harzallah, H., Jurie, F., Schmid, C.: (2009) Combining efficient object localization and image classification. In: IEEE International Conference on Computer Vision, pp 237–244
He, C., Lai, S., Lam, K.: (2019) Improving object detection with relation graph inference. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp 2537–2541
He, K., Zhang, X., Ren, S., Sun, J.: (2015) Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: IEEE International Conference on Computer Vision, pp 1026–1034
He, K., Gkioxari, G., Dollár, P., Girshick, RB.: (2017) Mask R-CNN. In: IEEE International Conference on Computer Vision, pp 2980–2988
Hoiem, D., Chodpathumwan, Y., Dai, Q.: Diagnosing error in object detectors. European Conference on Computer Vision 7574, 340–353 (2012)
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: (2018) Relation networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 3588–3597
Kipf, TN., Welling, M.: (2017) Semi-supervised classification with graph convolutional networks. In: International Conference on Learning Representations
LeCun, Y., Bengio, Y,: et al. (1995) Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks 3361(10):1995
Lee, J., Bang, J., Yang, S.: (2017) Object detection with sliding window in images including multiple similar objects. In: International Conference on Information and Communication Technology Convergence, pp 803–806
Li, B., Liu, Y., Wang, X.: (2019) Gradient harmonized single-stage detector. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8577–8584
Li, X., Yang, Y., Zhao, Q., Shen, T., Lin, Z., Liu, H.: (2020a) Spatial pyramid based graph reasoning for semantic segmentation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 8947–8956
Li, Z., Du, X., Cao, Y.: (2020b) GAR: graph assisted reasoning for object detection. In: IEEE Winter Conference on Applications of Computer Vision, pp 1284–1293
Lin, T., Maire, M., Belongie, S.J., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. IEEE international conference on computer vision 8693, 740–755 (2014)
Lin, T., Goyal, P., Girshick, RB., He, K., Dollár, P.: (2017) Focal loss for dense object detection. In: IEEE International Conference on Computer Vision, pp 2999–3007
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C., Berg, A.C.: SSD: single shot multibox detector. IEEE international conference on computer vision 9905, 21–37 (2016)
Liu, Y., Wang, R., Shan, S., Chen, X.: (2018) Structure inference net: Object detection using scene-level context and instance-level relationships. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6985–6994
Liu, Z., Jiang, Z., Wei, F.: (2019) OD-GCN object detection by knowledge graph with GCN. CoRR arXiv:1908.04385
Liu, Z., Zhang, H., Chen, Z., Wang, Z., Ouyang, W.: (2020) Disentangling and unifying graph convolutions for skeleton-based action recognition. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 140–149
Liu H, Wang T, Li Y, Lang C, Jin Y, Ling H (2021) Joint graph learning and matching for semantic feature correspondence. arXiv preprint arXiv:2109.00240
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: (2019) Libra R-CNN: towards balanced learning for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 821–830
Pennington, J., Socher, R., Manning, CD.: (2014) Glove: Global vectors for word representation. In: Moschitti A, Pang B, Daelemans W (eds) Proceedings of the 2014 conference on empirical methods in natural language processing, pp 1532–1543
Qiu, H., Li, H., Wu, Q., Meng, F., Xu, L., Ngan, K.N., Shi, H.: Hierarchical context features embedding for object detection. IEEE Trans Multim 22(12), 3039–3050 (2020)
Redmon, J., Farhadi, A.: (2018) Yolov3: An incremental improvement. CoRR arXiv:1804.02767
Redmon, J., Divvala, SK., Girshick, RB., Farhadi, A.: (2016) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Ren, S., He, K., Girshick, R.B., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6), 1137–1149 (2017)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M.S., Berg, A.C., Li, F.: Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3), 211–252 (2015)
Szegedy, C., Toshev, A., Erhan, D.: (2013) Deep neural networks for object detection. In: Neural Information Processing Systems, pp 2553–2561
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, AN., Kaiser, L., Polosukhin, I.: (2017) Attention is all you need. In: Neural Information Processing Systems, pp 5998–6008
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: (2018) Graph attention networks. In: International Conference on Learning Representations
Wang, T., Ling, H.: Gracker: A graph-based planar object tracker. IEEE Trans Pattern Anal Mach Intell 40(6), 1494–1501 (2018)
Wang, T., Ling, H., Lang, C., Feng, S.: Graph matching with adaptive and branching path following. IEEE Trans Pattern Anal Mach Intell 40(12), 2853–2867 (2018)
Wang, T., Liu, H., Li, Y., Jin, Y., Hou, X., Ling, H.: (2020) Learning combinatorial solver for graph matching. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp 7565–7574
Xie, G., Liu, L., Zhu, F., Zhao, F., Zhang, Z., Yao, Y., Qin, J., Shao, L.: (2020) Region graph embedding network for zero-shot learning. In: Computer Vision - ECCV 2020 - 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part IV, pp 562–580
Xie, G., Liu, J., Xiong, H., Shao, L.: (2021) Scale-aware graph neural network for few-shot semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pp 5475–5484
Xu, H., Jiang, C., Liang, X., Li, Z.: (2019a) Spatial-aware graph relation network for large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9298–9307
Xu, H., Jiang, C., Liang, X., Lin, L., Li, Z.: (2019b) Reasoning-rcnn: Unifying adaptive global reasoning into large-scale object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6419–6428
Yi, H., Shi, S., Ding, M., Sun, J., Xu, K., Zhou, H., Wang, Z., Li, S., Wang, G.: (2020) Segvoxelnet: Exploring semantic context and depth-aware features for 3d vehicle detection from point cloud. In: IEEE International Conference on Robotics and Automation, pp 2274–2280
Zhao, G., Wang, T., Li, Y., Jin, Y., Lang, C.: Entropy-aware self-training for graph convolutional networks. Neurocomputing 464, 394–407 (2021). https://doi.org/10.1016/j.neucom.2021.08.092
Zhu, X., Hu, H., Lin, S., Dai, J.: (2019) Deformable convnets V2: more deformable, better results. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 9308–9316
Acknowledgements
This work is supported by the Fundamental Research Funds for the Central Universities (2019YJS044), the National Nature Science Foundation of China (Nos. 62076021 and 61872032) and the Beijing Municipal Natural Science Foundation (No. 4202060).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
You, X., Liu, H., Wang, T. et al. Object detection by crossing relational reasoning based on graph neural network. Machine Vision and Applications 33, 1 (2022). https://doi.org/10.1007/s00138-021-01257-8
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00138-021-01257-8