Instance-level Object relation module for one-stage Object Detection

Rong, Wenzhong; Han, Jin; Liu, Gen

doi:10.1007/s11042-022-12264-w

Instance-level Object relation module for one-stage Object Detection

Published: 04 February 2022

Volume 81, pages 8617–8632, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

409 Accesses
1 Altmetric
Explore all metrics

Abstract

Leveraging the contextual information at instance-level can improve the accuracy in object detection. However, the-state-of-the-art object detection systems still detect each target individually without using contextual information. One reason is that contextual information is difficult to model. To solve this problem, the object relation module based on one-stage object detectors helps the object detectors learn the correlations between objects. It extracts and fuses the feature maps from various layers, including geometric features, categorical features, and appearance features, a transformation driven by visual attention mechanism are then performed to generate instance-level primary object relation features. Furthermore, a lightweight subnet is used to generate new feature prediction layer based on primary relation features and fused with the original detection layer to improve the detection ability. It does not require excessive amounts of computations and additional supervision and it can be easily ported to different one-stage object detection frameworks. The relation module is added to several one-stage object detectors (YOLO, Retinanet, and FCOS) as demonstrations and evaluate it on MS-COCO benchmark dataset after training. The results show that the relation module effectively improves the accuracy in one-stage object detection pipelines. Specifically, the relation module gives a 2.4 AP improvement for YOLOv3, 1.8 AP improvement for Retinanet and 1.6 AP improvement for FCOS.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Decoupling and Interaction: task coordination in single-stage object detection

Article 30 April 2024

Global contextual attention for pure regression object detection

Article 01 March 2022

Object detection using YOLO: challenges, architectural successors, datasets and applications

Article 08 August 2022

References

Ba J, Mnih V, Kavukcuoglu K (2014) Multiple Object recognition with visual attention[J]. Computer ence
Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934
Cai Z, Vasconcelos N (2017) Cascade R-CNN: Delving into high quality Object Detection[J]
Chen X , Gupta A . Spatial memory for context reasoning in Object Detection[J]. IEEE, 2017.
Book Google Scholar
Divvala SK, Hoiem D, Hays J H, et al. (2009) An empirical study of context in object detection[J]. Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 1271–1278
Everingham M, Eslami S, Gool LV et al (2015) The Pascal visual Object classes challenge: a retrospective[J]. Int J Comput Vis 111(1):98–136
Article Google Scholar
Felzenszwalb P (2010) F, et al. Object Detection with discriminatively trained part-based models.[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 32(9):1627–1645
Article Google Scholar
Fu J , Zheng H, Tao M (2017) Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition[C]// IEEE conference on Computer Vision & Pattern Recognition. IEEE
Fu CY, Liu W, Ranga A, et al. (2017) DSSD : Deconvolutional single shot detector[J]
Galleguillos C , Rabinovich A , Belongie S (2008) Object categorization using co-occurrence, location and appearance[C]// 2008 IEEE computer society conference on computer vision and pattern recognition (CVPR 2008), 24–26 June 2008, Anchorage, Alaska, USA. IEEE
Gao Z, Zhang H, Dong S, et al. (2020) Salient Object Detection in the Distributed Cloud-Edge Intelligent Network[J]. IEEE Network, PP(99):1–9
Girshick R (2015) Fast R-CNN[J]. Computer Science
He K, Zhang X, Ren S, et al. (2016) Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition 770–778
He K, Gkioxari G, Dollár P, et al. (2017) Mask r-cnn[C]//Proceedings of the IEEE international conference on computer vision 2961–2969.
Hu H , Gu J , Zhang Z , et al. Relation Networks for Object Detection[J]. 2018.
Huang J, Rathod V, Chen S, et al. (2016) Speed/accuracy trade-offs for modern convolutional object detectors[J]. IEEE
Jie H , Li S , Gang S , et al. (2017) Squeeze-and-Excitation Networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, PP(99).
Kong T, Sun F, Liu H, et al. (2020) FoveaBox: Beyound Anchor-Based Object Detection[J]. IEEE Transactions on Image Processing PP(99):1–1
Krishna R , Zhu Y , Groth O, Johnson J., Hata K., Kravitz J., Chen S., Kalantidis Y., Li L.J., Shamma D. A., Bernstein M. S., Fei-Fei L. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations[J]. Int J Comput Vis, 2017, 123(1), 123, 123(73.
Law H, Deng J (2020) CornerNet: detecting objects as paired Keypoints[J]. Int J Comput Vis 128(3):642–656
Article Google Scholar
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551
Article Google Scholar
Lee SJ, Lee S, Cho SI, Kang SJ (2020) Object Detection-based video retargeting with spatial–temporal consistency[J]. IEEE Transactions on Circuits and Systems for Video Technology 30(12):4434–4439
Article Google Scholar
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2016) Attentive contexts for object detection[J]. IEEE Transactions on Multimedia 19(5):944–954
Article Google Scholar
Li Y, Qi H, Dai J, et al. (2017) Fully convolutional instance-aware semantic segmentation[C]// Computer Vision & Pattern Recognition. IEEE
Lin TY, Maire M, Belongie S et al (2014) Microsoft COCO: common objects in context[J]. Springer International Publishing
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense Object Detection[C]// IEEE transactions on Pattern Analysis & Machine Intelligence. IEEE:2999–3007
Lin TY, Dollar P, Girshick R, et al. (2017) Feature Pyramid Networks for Object Detection[J]. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Liu W , Anguelov D , Erhan D, et al. (2016) SSD: single shot MultiBox detector[C]// European conference on computer vision. Springer, Cham
Mottaghi R , Chen X , Liu X, et al. (2014) The role of context for Object Detection and semantic segmentation in the wild[C]// Computer Vision & Pattern Recognition. IEEE
Oliva A, Torralba A, Castelhano MS, et al. (2003) Top-down control of visual attention in object detection[C]// International Conference on Image Processing. IEEE, :I-253-6
Redmon J , Farhadi A (2017) YOLO9000: Better, Faster, Stronger[C]// IEEE. IEEE, 6517–6525.
Redmon J , Farhadi A (2018) YOLOv3: An Incremental Improvement[J]. arXiv e-prints
Redmon J, Divvala S, Girshick R, et al. (2016) You only look once: unified, real-time Object Detection[J]. Computer Vision & Pattern Recognition
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time Object Detection with region proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 39(6):1137–1149
Article Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge[J]. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Shrivastava A, Sukthankar R, Malik J, et al. (2016) Beyond skip connections: top-down modulation for Object Detection[J]
Stewart R, Andriluka M, Ng AY (2016) End-to-end people detection in crowded scenes[C]// Computer Vision & Pattern Recognition. IEEE
Tian Z, Shen C, Chen H, et al. (2020) FCOS: fully convolutional one-stage Object Detection[C]// 2019 IEEE/CVF international conference on computer vision (ICCV). IEEE
Torralba A, Murphy KP, Freeman WT et al (2003) Context-based vision system for place and object recognition[C]//computer vision, IEEE international conference on. IEEE Computer Society 2:273–273
Google Scholar
Tu Z (2008) Auto-context and its application to high-level vision tasks[C]// IEEE conference on Computer Vision & Pattern Recognition. IEEE
Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is all you need[J]. arXiv preprint arXiv:1706.03762
Xu K, Ba J, Kiros R et al (2015) Show, attend and tell: neural image caption generation with visual attention[J]. Computer Science:2048–2057
Zhang N , Donahue J , Girshick R, et al. (2014) Part-based R-CNNs for fine-grained category Detection[J]. European Conference on Computer Vision
Zhou X, Wang D, Krhenbühl P (2019) Objects as Points[J]
Zhu C, He Y, Savvides M (2019) Feature Selective Anchor-Free Module for Single-Shot Object Detection[C]

Download references

Author information

Authors and Affiliations

College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao, 266590, China
Wenzhong Rong, Jin Han & Gen Liu

Authors

Wenzhong Rong
View author publications
You can also search for this author in PubMed Google Scholar
Jin Han
View author publications
You can also search for this author in PubMed Google Scholar
Gen Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin Han.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rong, W., Han, J. & Liu, G. Instance-level Object relation module for one-stage Object Detection. Multimed Tools Appl 81, 8617–8632 (2022). https://doi.org/10.1007/s11042-022-12264-w

Download citation

Received: 11 March 2021
Revised: 24 June 2021
Accepted: 14 January 2022
Published: 04 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-12264-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Instance-level Object relation module for one-stage Object Detection

Abstract

Access this article

Similar content being viewed by others

Decoupling and Interaction: task coordination in single-stage object detection

Global contextual attention for pure regression object detection

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Instance-level Object relation module for one-stage Object Detection

Abstract

Access this article

Similar content being viewed by others

Decoupling and Interaction: task coordination in single-stage object detection

Global contextual attention for pure regression object detection

Object detection using YOLO: challenges, architectural successors, datasets and applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation