Cross-scale Dynamic Relation Network for Object Detection

Zhong, Xinfang; Li, Zhixin

doi:10.1007/978-981-99-7019-3_31

Xinfang Zhong^12,13 &
Zhixin Li^12,13

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14325))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

510 Accesses

Abstract

The majority of object detectors only consider the features in region proposals, without taking the global context or the relationships between objects into detection. Conceivably, it would inevitably limit the improvement of performance. To tackle the problem, we introduce a Cross-Scale Dynamic Relation Network (CSDRN) that can explore the relationships between specific objects in an image, and its core components include a Cross-Scale Semantic-Aware Module (CSSAM), Dynamic Relation Graph Reasoning (DRGR), and Semantic Attention Fusion Module(SAFM). Through the CSSAM, the crucial information in feature maps of different scales achieve semantic interaction to obtain a cross-scale semantic feature. We activate the category knowledge in the image and combine the cross-scale semantic feature to create a dynamic relationship graph. Therefore, we can get more precise relation between objects. Guided by the relation, a semantic attention is generated to enrich the visual features. Experimental results on the COCO dataset show that the proposed CSDRN can effectively improve the detection performance, reaching 54.8% box AP, which is 3.9% box AP over the baseline. Moreover, 47.6% mask AP is achieved in instance segmentation, exceeding the baseline 3.6% mask AP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)
Google Scholar
Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)
Google Scholar
Chen, C., Yu, J., Ling, Q.: Sparse attention block: aggregating contextual information for object detection. Pattern Recogn. 124, 108418 (2022)
Article Google Scholar
Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: CVPR, pp. 13039–13048 (2021)
Google Scholar
Chen, S., Li, Z., Tang, Z.: Relation R-CNN: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)
Article Google Scholar
Chen, Z.M., Jin, X., Zhao, B.R., et al.: HCE: hierarchical context embedding for region-based object detection. IEEE TIP 30, 6917–6929 (2021)
Google Scholar
Chi, C., Wei, F., Hu, H.: RelationNet++: Bridging visual representations for object detection via transformer decoder. In: NIPS, pp. 13564–13574 (2020)
Google Scholar
Dai, X., Chen, Y., Xiao, B., et al.: Dynamic head: unifying object detection heads with attentions. In: CVPR, pp. 7373–7382 (2021)
Google Scholar
Ding, P., Zhang, J., Zhou, H., et al.: Pyramid context learning for object detection. J. Supercomput. 76, 9374–9387 (2020)
Article Google Scholar
Fang, Y., Kuan, K., Lin, J., et al.: Object detection meets knowledge graphs. In: IJCAI, pp. 1661–1667 (2017)
Google Scholar
Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)
Google Scholar
He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Hu, H., Gu, J., Zhang, Z., et al.: Relation networks for object detection. In: CVPR, pp. 3588–3597 (2018)
Google Scholar
Ji, H., Ye, K., Wan, Q., et al.: Reasonable object detection guided by knowledge of global context and category relationship. Expert Syst. Appl. 209, 118285 (2022)
Article Google Scholar
Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. In: NIPS, pp. 1559–1570 (2018)
Google Scholar
Jin, Z., Yu, D., Song, L., et al.: You should look atăall objects. In: ECCV, pp. 332–349 (2022)
Google Scholar
Kim, J.H., On, K.W., Lim, W., et al.: Hadamard product for low-rank bilinear pooling. In: ICLR, pp. 1–8 (2017)
Google Scholar
Li, F., Zhang, H., Liu, S., et al.: DN-DETR: accelerate DETR training by introducing query denoising. In: CVPR, pp. 13609–13617 (2022)
Google Scholar
Li, Y., Wu, C.Y., Fan, H., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: CVPR, pp. 4794–4804 (2022)
Google Scholar
Li, Z., Du, X., Cao, Y.: GAR: Graph assisted reasoning for object detection. In: WACV, pp. 1284–1293 (2020)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)
Google Scholar
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 9992–10002 (2021)
Google Scholar
Meng, D., Chen, X., Fan, Z., et al.: Conditional DETR for fast training convergence. In: ICCV, pp. 3631–3640 (2021)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)
Google Scholar
Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)
Article Google Scholar
Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: ICCV, pp. 3591–3600 (2021)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS. pp. 5998–6008 (2017)
Google Scholar
Wang, W., Xie, E., Li, X., et al.: PVT v2: improved baselines with pyramid vision transformer. In: CVM, pp. 415–424 (2022)
Google Scholar
Xian, T., Li, Z., Tang, Z., et al.: Adaptive path selection for dynamic image captioning. IEEE TCSVT 32(9), 5762–5775 (2022)
Google Scholar
Xie, X., Li, Z., Tang, Z., et al.: Unifying knowledge iterative dissemination and relational reconstruction network for imagetext matching. Inf. Process. Manage. 60(1), 103154 (2023)
Article Google Scholar
Xu, H., Jiang, C., Liang, X., et al.: Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: CVPR, pp. 6419–6428 (2019)
Google Scholar
Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: CVPR, pp. 9298–9307 (2019)
Google Scholar
Yang, X., Zhong, X., Li, Z.: GRDN: Graph relation decision network for object detection. In: ICME, pp. 1–6 (2022)
Google Scholar
Zhang, W., Fu, C., Chang, X., et al.: A more compact object detector head network with feature enhancement and relational reasoning. Neurocomputing 499, 23–34 (2022)
Article Google Scholar
Zhu, J., Li, Z., Zeng, Y., et al.: Image-text matching with fine-grained relational dependency and bidirectional attention-based generative networks. In: ACM MM, pp. 39–403 (2022)
Google Scholar
Zhu, X., Su, W., Lu, L., et al.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR, pp. 1–8 (2021)
Google Scholar

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No.2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Innovation Project of Guangxi Graduate Education (Nos. YCBZ2023055, YCBZ2022060), Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China
Xinfang Zhong & Zhixin Li
Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Xinfang Zhong & Zhixin Li

Authors

Xinfang Zhong
View author publications
You can also search for this author in PubMed Google Scholar
Zhixin Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhixin Li .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Fenrong Liu
SEEK Limited, Cremorne, NSW, Australia
Arun Anand Sadanandan
MIMOS (Malaysia), Kuala Lumpur, Malaysia
Duc Nghia Pham
Universitas Indonesia, Depok, Indonesia
Petrus Mursanto
Tabcorp Holdings Limited, Melbourne, VIC, Australia
Dickson Lukose

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhong, X., Li, Z. (2024). Cross-scale Dynamic Relation Network for Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_31

Download citation

DOI: https://doi.org/10.1007/978-981-99-7019-3_31
Published: 10 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7018-6
Online ISBN: 978-981-99-7019-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Cross-scale Dynamic Relation Network for Object Detection