Skip to main content

Cross-scale Dynamic Relation Network for Object Detection

  • Conference paper
  • First Online:
PRICAI 2023: Trends in Artificial Intelligence (PRICAI 2023)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 14325))

Included in the following conference series:

  • 510 Accesses

Abstract

The majority of object detectors only consider the features in region proposals, without taking the global context or the relationships between objects into detection. Conceivably, it would inevitably limit the improvement of performance. To tackle the problem, we introduce a Cross-Scale Dynamic Relation Network (CSDRN) that can explore the relationships between specific objects in an image, and its core components include a Cross-Scale Semantic-Aware Module (CSSAM), Dynamic Relation Graph Reasoning (DRGR), and Semantic Attention Fusion Module(SAFM). Through the CSSAM, the crucial information in feature maps of different scales achieve semantic interaction to obtain a cross-scale semantic feature. We activate the category knowledge in the image and combine the cross-scale semantic feature to create a dynamic relationship graph. Therefore, we can get more precise relation between objects. Guided by the relation, a semantic attention is generated to enrich the visual features. Experimental results on the COCO dataset show that the proposed CSDRN can effectively improve the detection performance, reaching 54.8% box AP, which is 3.9% box AP over the baseline. Moreover, 47.6% mask AP is achieved in instance segmentation, exceeding the baseline 3.6% mask AP.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving into high quality object detection. In: CVPR, pp. 6154–6162 (2018)

    Google Scholar 

  2. Carion, N., Massa, F., Synnaeve, G., et al.: End-to-end object detection with transformers. In: ECCV, pp. 213–229 (2020)

    Google Scholar 

  3. Chen, C., Yu, J., Ling, Q.: Sparse attention block: aggregating contextual information for object detection. Pattern Recogn. 124, 108418 (2022)

    Article  Google Scholar 

  4. Chen, Q., Wang, Y., Yang, T., et al.: You only look one-level feature. In: CVPR, pp. 13039–13048 (2021)

    Google Scholar 

  5. Chen, S., Li, Z., Tang, Z.: Relation R-CNN: a graph based relation-aware network for object detection. IEEE Signal Process. Lett. 27, 1680–1684 (2020)

    Article  Google Scholar 

  6. Chen, Z.M., Jin, X., Zhao, B.R., et al.: HCE: hierarchical context embedding for region-based object detection. IEEE TIP 30, 6917–6929 (2021)

    Google Scholar 

  7. Chi, C., Wei, F., Hu, H.: RelationNet++: Bridging visual representations for object detection via transformer decoder. In: NIPS, pp. 13564–13574 (2020)

    Google Scholar 

  8. Dai, X., Chen, Y., Xiao, B., et al.: Dynamic head: unifying object detection heads with attentions. In: CVPR, pp. 7373–7382 (2021)

    Google Scholar 

  9. Ding, P., Zhang, J., Zhou, H., et al.: Pyramid context learning for object detection. J. Supercomput. 76, 9374–9387 (2020)

    Article  Google Scholar 

  10. Fang, Y., Kuan, K., Lin, J., et al.: Object detection meets knowledge graphs. In: IJCAI, pp. 1661–1667 (2017)

    Google Scholar 

  11. Girshick, R.: Fast R-CNN. In: ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  12. Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp. 580–587 (2014)

    Google Scholar 

  13. He, K., Gkioxari, G., Dollár, P., et al.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017)

    Google Scholar 

  14. He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)

    Google Scholar 

  15. Hu, H., Gu, J., Zhang, Z., et al.: Relation networks for object detection. In: CVPR, pp. 3588–3597 (2018)

    Google Scholar 

  16. Ji, H., Ye, K., Wan, Q., et al.: Reasonable object detection guided by knowledge of global context and category relationship. Expert Syst. Appl. 209, 118285 (2022)

    Article  Google Scholar 

  17. Jiang, C., Xu, H., Liang, X., et al.: Hybrid knowledge routed modules for large-scale object detection. In: NIPS, pp. 1559–1570 (2018)

    Google Scholar 

  18. Jin, Z., Yu, D., Song, L., et al.: You should look atăall objects. In: ECCV, pp. 332–349 (2022)

    Google Scholar 

  19. Kim, J.H., On, K.W., Lim, W., et al.: Hadamard product for low-rank bilinear pooling. In: ICLR, pp. 1–8 (2017)

    Google Scholar 

  20. Li, F., Zhang, H., Liu, S., et al.: DN-DETR: accelerate DETR training by introducing query denoising. In: CVPR, pp. 13609–13617 (2022)

    Google Scholar 

  21. Li, Y., Wu, C.Y., Fan, H., et al.: MViTv2: improved multiscale vision transformers for classification and detection. In: CVPR, pp. 4794–4804 (2022)

    Google Scholar 

  22. Li, Z., Du, X., Cao, Y.: GAR: Graph assisted reasoning for object detection. In: WACV, pp. 1284–1293 (2020)

    Google Scholar 

  23. Lin, T.Y., Dollár, P., Girshick, R., et al.: Feature pyramid networks for object detection. In: CVPR, pp. 936–944 (2017)

    Google Scholar 

  24. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: Hierarchical vision transformer using shifted windows. In: ICCV, pp. 9992–10002 (2021)

    Google Scholar 

  25. Meng, D., Chen, X., Fan, Z., et al.: Conditional DETR for fast training convergence. In: ICCV, pp. 3631–3640 (2021)

    Google Scholar 

  26. Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: Unified, real-time object detection. In: CVPR, pp. 779–788 (2016)

    Google Scholar 

  27. Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: CVPR, pp. 6517–6525 (2017)

    Google Scholar 

  28. Ren, S., He, K., Girshick, R., et al.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE TPAMI 39(6), 1137–1149 (2017)

    Article  Google Scholar 

  29. Sun, Z., Cao, S., Yang, Y., et al.: Rethinking transformer-based set prediction for object detection. In: ICCV, pp. 3591–3600 (2021)

    Google Scholar 

  30. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS. pp. 5998–6008 (2017)

    Google Scholar 

  31. Wang, W., Xie, E., Li, X., et al.: PVT v2: improved baselines with pyramid vision transformer. In: CVM, pp. 415–424 (2022)

    Google Scholar 

  32. Xian, T., Li, Z., Tang, Z., et al.: Adaptive path selection for dynamic image captioning. IEEE TCSVT 32(9), 5762–5775 (2022)

    Google Scholar 

  33. Xie, X., Li, Z., Tang, Z., et al.: Unifying knowledge iterative dissemination and relational reconstruction network for imagetext matching. Inf. Process. Manage. 60(1), 103154 (2023)

    Article  Google Scholar 

  34. Xu, H., Jiang, C., Liang, X., et al.: Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: CVPR, pp. 6419–6428 (2019)

    Google Scholar 

  35. Xu, H., Jiang, C., Liang, X., et al.: Spatial-aware graph relation network for large-scale object detection. In: CVPR, pp. 9298–9307 (2019)

    Google Scholar 

  36. Yang, X., Zhong, X., Li, Z.: GRDN: Graph relation decision network for object detection. In: ICME, pp. 1–6 (2022)

    Google Scholar 

  37. Zhang, W., Fu, C., Chang, X., et al.: A more compact object detector head network with feature enhancement and relational reasoning. Neurocomputing 499, 23–34 (2022)

    Article  Google Scholar 

  38. Zhu, J., Li, Z., Zeng, Y., et al.: Image-text matching with fine-grained relational dependency and bidirectional attention-based generative networks. In: ACM MM, pp. 39–403 (2022)

    Google Scholar 

  39. Zhu, X., Su, W., Lu, L., et al.: Deformable DETR: deformable transformers for end-to-end object detection. In: ICLR, pp. 1–8 (2021)

    Google Scholar 

Download references

Acknowledgements

This work is supported by National Natural Science Foundation of China (Nos. 62276073, 61966004), Guangxi Natural Science Foundation (No.2019GXNSFDA245018), Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Innovation Project of Guangxi Graduate Education (Nos. YCBZ2023055, YCBZ2022060), Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhong, X., Li, Z. (2024). Cross-scale Dynamic Relation Network for Object Detection. In: Liu, F., Sadanandan, A.A., Pham, D.N., Mursanto, P., Lukose, D. (eds) PRICAI 2023: Trends in Artificial Intelligence. PRICAI 2023. Lecture Notes in Computer Science(), vol 14325. Springer, Singapore. https://doi.org/10.1007/978-981-99-7019-3_31

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-7019-3_31

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-7018-6

  • Online ISBN: 978-981-99-7019-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics