Skip to main content
Log in

Instance-level Object relation module for one-stage Object Detection

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Leveraging the contextual information at instance-level can improve the accuracy in object detection. However, the-state-of-the-art object detection systems still detect each target individually without using contextual information. One reason is that contextual information is difficult to model. To solve this problem, the object relation module based on one-stage object detectors helps the object detectors learn the correlations between objects. It extracts and fuses the feature maps from various layers, including geometric features, categorical features, and appearance features, a transformation driven by visual attention mechanism are then performed to generate instance-level primary object relation features. Furthermore, a lightweight subnet is used to generate new feature prediction layer based on primary relation features and fused with the original detection layer to improve the detection ability. It does not require excessive amounts of computations and additional supervision and it can be easily ported to different one-stage object detection frameworks. The relation module is added to several one-stage object detectors (YOLO, Retinanet, and FCOS) as demonstrations and evaluate it on MS-COCO benchmark dataset after training. The results show that the relation module effectively improves the accuracy in one-stage object detection pipelines. Specifically, the relation module gives a 2.4 AP improvement for YOLOv3, 1.8 AP improvement for Retinanet and 1.6 AP improvement for FCOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Ba J, Mnih V, Kavukcuoglu K (2014) Multiple Object recognition with visual attention[J]. Computer ence

  2. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934

  3. Cai Z, Vasconcelos N (2017) Cascade R-CNN: Delving into high quality Object Detection[J]

  4. Chen X , Gupta A . Spatial memory for context reasoning in Object Detection[J]. IEEE, 2017.

    Book  Google Scholar 

  5. Divvala SK, Hoiem D, Hays J H, et al. (2009) An empirical study of context in object detection[J]. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1271–1278

  6. Everingham M, Eslami S, Gool LV et al (2015) The Pascal visual Object classes challenge: a retrospective[J]. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  7. Felzenszwalb P (2010) F, et al. Object Detection with discriminatively trained part-based models.[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 32(9):1627–1645

    Article  Google Scholar 

  8. Fu J , Zheng H, Tao M (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]// IEEE conference on Computer Vision & Pattern Recognition. IEEE

  9. Fu CY, Liu W, Ranga A, et al. (2017) DSSD : Deconvolutional single shot detector[J]

  10. Galleguillos C , Rabinovich A , Belongie S (2008) Object categorization using co-occurrence, location and appearance[C]// 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24–26 June 2008, Anchorage, Alaska, USA. IEEE

  11. Gao Z, Zhang H, Dong S, et al. (2020) Salient Object Detection in the Distributed Cloud-Edge Intelligent Network[J]. IEEE Network, PP(99):1–9

  12. Girshick R (2015) Fast R-CNN[J]. Computer Science

  13. He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition 770–778

  14. He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision 2961–2969.

  15. Hu H , Gu J , Zhang Z , et al. Relation Networks for Object Detection[J]. 2018.

  16. Huang J, Rathod V, Chen S, et al. (2016) Speed/accuracy trade-offs for modern convolutional object detectors[J]. IEEE

  17. Jie H , Li S , Gang S , et al. (2017) Squeeze-and-Excitation Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99).

  18. Kong T, Sun F, Liu H, et al. (2020) FoveaBox: Beyound Anchor-Based Object Detection[J]. IEEE Transactions on Image Processing PP(99):1–1

  19. Krishna R , Zhu Y , Groth O, Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Li L.J., Shamma D. A., Bernstein M. S., Fei-Fei L. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J]. Int J Comput Vis, 2017, 123(1), 123, 123(73.

  20. Law H, Deng J (2020) CornerNet: detecting objects as paired Keypoints[J]. Int J Comput Vis 128(3):642–656

    Article  Google Scholar 

  21. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551

    Article  Google Scholar 

  22. Lee SJ, Lee S, Cho SI, Kang SJ (2020) Object Detection-based video retargeting with spatial–temporal consistency[J]. IEEE Transactions on Circuits and Systems for Video Technology 30(12):4434–4439

    Article  Google Scholar 

  23. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2016) Attentive contexts for object detection[J]. IEEE Transactions on Multimedia 19(5):944–954

    Article  Google Scholar 

  24. Li Y, Qi H, Dai J, et al. (2017) Fully convolutional instance-aware semantic segmentation[C]// Computer Vision & Pattern Recognition. IEEE

  25. Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context[J]. Springer International Publishing

  26. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense Object Detection[C]// IEEE transactions on Pattern Analysis & Machine Intelligence. IEEE:2999–3007

  27. Lin TY, Dollar P, Girshick R, et al. (2017) Feature Pyramid Networks for Object Detection[J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)

  28. Liu W , Anguelov D , Erhan D, et al. (2016) SSD: single shot MultiBox detector[C]// European conference on computer vision. Springer, Cham

  29. Mottaghi R , Chen X , Liu X, et al. (2014) The role of context for Object Detection and semantic segmentation in the wild[C]// Computer Vision & Pattern Recognition. IEEE

  30. Oliva A, Torralba A, Castelhano MS, et al. (2003) Top-down control of visual attention in object detection[C]// International Conference on Image Processing. IEEE, :I-253-6

  31. Redmon J , Farhadi A (2017) YOLO9000: Better, Faster, Stronger[C]// IEEE. IEEE, 6517–6525.

  32. Redmon J , Farhadi A (2018) YOLOv3: An Incremental Improvement[J]. arXiv e-prints

  33. Redmon J, Divvala S, Girshick R, et al. (2016) You only look once: unified, real-time Object Detection[J]. Computer Vision & Pattern Recognition

  34. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time Object Detection with region proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6):1137–1149

    Article  Google Scholar 

  35. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge[J]. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  36. Shrivastava A, Sukthankar R, Malik J, et al. (2016) Beyond skip connections: top-down modulation for Object Detection[J]

  37. Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes[C]// Computer Vision & Pattern Recognition. IEEE

  38. Tian Z, Shen C, Chen H, et al. (2020) FCOS: fully convolutional one-stage Object Detection[C]// 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE

  39. Torralba A, Murphy KP, Freeman WT et al (2003) Context-based vision system for place and object recognition[C]//computer vision, IEEE international conference on. IEEE Computer Society 2:273–273

    Google Scholar 

  40. Tu Z (2008) Auto-context and its application to high-level vision tasks[C]// IEEE conference on Computer Vision & Pattern Recognition. IEEE

  41. Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need[J]. arXiv preprint arXiv:1706.03762

  42. Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: neural image caption generation with visual attention[J]. Computer Science:2048–2057

  43. Zhang N , Donahue J , Girshick R, et al. (2014) Part-based R-CNNs for fine-grained category Detection[J]. European Conference on Computer Vision

  44. Zhou X, Wang D, Krhenbühl P (2019) Objects as Points[J]

  45. Zhu C, He Y, Savvides M (2019) Feature Selective Anchor-Free Module for Single-Shot Object Detection[C]

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jin Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rong, W., Han, J. & Liu, G. Instance-level Object relation module for one-stage Object Detection. Multimed Tools Appl 81, 8617–8632 (2022). https://doi.org/10.1007/s11042-022-12264-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12264-w

Keywords

Navigation