Abstract
The majority of object detectors only consider the features in region proposals, without taking the global context or the relationships between objects into detection. Conceivably, it would inevitably limit the improvement of performance. To tackle the problem, we introduce a Cross-Scale Dynamic Relation Network (CSDRN) that can explore the relationships between specific objects in an image, and its core components include a Cross-Scale Semantic-Aware Module (CSSAM), Dynamic Relation Graph Reasoning (DRGR), and Semantic Attention Fusion Module(SAFM). Through the CSSAM, the crucial information in feature maps of different scales achieve semantic interaction to obtain a cross-scale semantic feature. We activate the category knowledge in the image and combine the cross-scale semantic feature to create a dynamic relationship graph. Therefore, we can get more precise relation between objects. Guided by the relation, a semantic attention is generated to enrich the visual features. Experimental results on the COCO dataset show that the proposed CSDRN can effectively improve the detection performance, reaching 54.8% box AP, which is 3.9% box AP over the baseline. Moreover, 47.6% mask AP is achieved in instance segmentation, exceeding the baseline 3.6% mask AP.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)
Chen, C., Yu, J., Ling, Q.: Sparse attention block: aggregating contextual information for object detection. Pattern Recogn. 124, 108418 (2022)
Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: CVPR, pp. 13039–13048 (2021)
Chen, S., Li, Z., Tang, Z.: Relation R-CNN: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)
Chen, Z.M., Jin, X., Zhao, B.R., et al.: HCE: hierarchical context embedding for region-based object detection. IEEE TIP 30, 6917–6929 (2021)
Chi, C., Wei, F., Hu, H.: RelationNet++: Bridging visual representations for object detection via transformer decoder. In: NIPS, pp. 13564–13574 (2020)
Dai, X., Chen, Y., Xiao, B., et al.: Dynamic head: unifying object detection heads with attentions. In: CVPR, pp. 7373–7382 (2021)
Ding, P., Zhang, J., Zhou, H., et al.: Pyramid context learning for object detection. J. Supercomput. 76, 9374–9387 (2020)
Fang, Y., Kuan, K., Lin, J., et al.: Object detection meets knowledge graphs. In: IJCAI, pp. 1661–1667 (2017)
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Hu, H., Gu, J., Zhang, Z., et al.: Relation networks for object detection. In: CVPR, pp. 3588–3597 (2018)
Ji, H., Ye, K., Wan, Q., et al.: Reasonable object detection guided by knowledge of global context and category relationship. Expert Syst. Appl. 209, 118285 (2022)
Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. In: NIPS, pp. 1559–1570 (2018)
Jin, Z., Yu, D., Song, L., et al.: You should look atăall objects. In: ECCV, pp. 332–349 (2022)
Kim, J.H., On, K.W., Lim, W., et al.: Hadamard product for low-rank bilinear pooling. In: ICLR, pp. 1–8 (2017)
Li, F., Zhang, H., Liu, S., et al.: DN-DETR: accelerate DETR training by introducing query denoising. In: CVPR, pp. 13609–13617 (2022)
Li, Y., Wu, C.Y., Fan, H., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: CVPR, pp. 4794–4804 (2022)
Li, Z., Du, X., Cao, Y.: GAR: Graph assisted reasoning for object detection. In: WACV, pp. 1284–1293 (2020)
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 9992–10002 (2021)
Meng, D., Chen, X., Fan, Z., et al.: Conditional DETR for fast training convergence. In: ICCV, pp. 3631–3640 (2021)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)
Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: ICCV, pp. 3591–3600 (2021)
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS. pp. 5998–6008 (2017)
Wang, W., Xie, E., Li, X., et al.: PVT v2: improved baselines with pyramid vision transformer. In: CVM, pp. 415–424 (2022)
Xian, T., Li, Z., Tang, Z., et al.: Adaptive path selection for dynamic image captioning. IEEE TCSVT 32(9), 5762–5775 (2022)
Xie, X., Li, Z., Tang, Z., et al.: Unifying knowledge iterative dissemination and relational reconstruction network for imagetext matching. Inf. Process. Manage. 60(1), 103154 (2023)
Xu, H., Jiang, C., Liang, X., et al.: Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: CVPR, pp. 6419–6428 (2019)
Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: CVPR, pp. 9298–9307 (2019)
Yang, X., Zhong, X., Li, Z.: GRDN: Graph relation decision network for object detection. In: ICME, pp. 1–6 (2022)
Zhang, W., Fu, C., Chang, X., et al.: A more compact object detector head network with feature enhancement and relational reasoning. Neurocomputing 499, 23–34 (2022)
Zhu, J., Li, Z., Zeng, Y., et al.: Image-text matching with fine-grained relational dependency and bidirectional attention-based generative networks. In: ACM MM, pp. 39–403 (2022)
Zhu, X., Su, W., Lu, L., et al.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR, pp. 1–8 (2021)
Acknowledgements
This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No.2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Innovation Project of Guangxi Graduate Education (Nos. YCBZ2023055, YCBZ2022060), Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhong, X., Li, Z. (2024). Cross-scale Dynamic Relation Network for Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_31
Download citation
DOI: https://doi.org/10.1007/978-981-99-7019-3_31
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)