Detail injection with heterogeneous composite backbone network for object detection

Yan, Zhiwei; Zheng, Huicheng; Li, Ye

doi:10.1007/s11042-022-12241-3

Detail injection with heterogeneous composite backbone network for object detection

Published: 18 February 2022

Volume 81, pages 11621–11637, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Zhiwei Yan^1,2,3,
Huicheng Zheng ORCID: orcid.org/0000-0002-6729-4176^1,2,3 &
Ye Li^1,2,3,4

225 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Current detectors usually rely on backbone networks initially designed for image classification and pretrained on large image classification datasets, making them suitable for modeling global information. The consequence is that most detectors struggle to detect small objects due to rapid loss of local spatial details that are critical for accurate localization. In this work, we propose a backbone network, called the heterogeneous composite backbone, which aims to not only utilize deep features generated by the off-the-shelf classification-oriented backbone network for global information extraction, but also benefit from our re-designed detail extraction backbone network that yields features with more detailed spatial information, which is accomplished through joining two backbones with diverse structures. Our new backbone is shown to be beneficial for modeling fine-grained local information. Furthermore, to guarantee that the features from the randomly initialized detail extraction network are not suppressed in the end-to-end training process, we explore a new training scheme that combines features from a pretrained deep backbone and features generated by a network trained nearly from scratch. We carry out experiments on benchmark datasets including PASCAL VOC and MS COCO, which demonstrate that the proposed backbone network can achieve considerable improvements in object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Article 20 March 2021

Feature-enhanced composite backbone network for object detection

Article 13 February 2024

Backbone Based Feature Enhancement for Object Detection

References

Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Cao Y, Chen K, Loy CC, Lin D (2020) Prime sample attention in object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 11583–11591
Chen R, Liu Y, Zhang M, Liu S, Yu B, Tai YW (2020) Dive deeper into box for object detection. In: Proceedings of the European conference on computer vision, pp 412–428
Chi C, Zhang S, Xing J, Lei Z, Li S, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence, vol 33, pp 8231–8238
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks Advances In neural information processing systems, pp 379–387
Deng J, Dong W, Socher R, Li LJ, Li K, Fei LF (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 248–255
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2021) An image is worth 16x16 words: transformers for image recognition at scale. In: International conference on learning representations
Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) Centernet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6569–6578
Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. In: Proceedings of the European conference on computer vision, pp 399–416
Dvornik N, Shmelkov K, Mairal J, Schmid C (2017) Blitznet: a real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision, pp 4154–4162
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: Deconvolutional single shot detector. arXiv:1701.06659
Fu Z, Jin Z, Qi GJ, Shen C, Jiang R, Chen Y, Hua XS (2018) Previewer for multi-scale object detector. In: Proceedings of the ACM international conference on multimedia, pp 265–273
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Gregory G, Alex H, Pietro P (2007) Caltech-256 object category dataset. Tech. Rep. CNS-TR-2007-001, California Institute of Technology, Pasadena CA
He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision, pp 4919–4927
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of the European conference on computer vision, pp 340–353. Springer
Huang G, Liu Z, van der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2261–2269
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5244–5252
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750
Le H, Borji A (2017) What are the receptive, effective receptive, and projective fields of neurons in convolutional neural networks? arXiv:abs/170507049
Li J, Ghosh S (2020) Quantum-soft qubo suppression for accurate object detection. In: Proceedings of the European conference on computer vision, pp 1–16
Li S, Yang L, Huang J, Hua XS, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6610–6618
Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359(24):209–218
Article Google Scholar
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: design backbone for object detection. In: Proceedings of the European conference on computer vision, pp 4700–4708
Lin TY, Dollȧr P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 936–944
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Lin TY, Maire M, Belongie S, Bourdev L, Girshick R, Hays J, Perona P, Ramanan D, Zitnick CL, Dollár P (2014) Microsoft COCO: common objects in context. In: Proceedings of the European conference on computer vision, pp 740–755
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision, pp 404–419
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of the European conference on computer vision, pp 21–37
Liu Y, Wang Y, Wang S, Liang T, Zhao Q, Tang Z, Ling H (2020) CBNet: a novel composite backbone network architecture for object detection. In: Proceedings of the AAAI conference on artificial intelligence, pp 11653–11660. Springer
Mark E, Gool LV, Williams CKI, Winn J, Zisserman A (2010) The PASCAL visual object classes (VOC) challenge. Int J Comput Vis 88 (2):303–338
Article Google Scholar
Pang Y, Wang T, Anwer RM, Khan FS, Shao L (2019) Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7336–7344
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp 1919–1927
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42(2):398–412
Article Google Scholar
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the 6th. International conference on learning representations, pp 1–14
Singh B, Davis LS (2018) An analysis of scale invariance in object detection-SNIP. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9627–9636
Tolstikhin I, Houlsby N, Kolesnikov A, Beyer L, Zhai X, Unterthiner T, Yung J, Steiner A, Keysers D, Uszkoreit J et al (2021) Mlp-mixer: an all-mlp architecture for vision. arXiv:2105.01601
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Zhang S, Wen L, Bian X, Lei Z, Li S (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5813–5821
Zhao H, Zhiwei L, Lufa F, Tianqi Z (2020) A balanced feature fusion ssd for object detection. Neural Process Lett 51:2789–2806
Article Google Scholar
Zheng H, Chen J, Chen L, Yan Z (2020) Feature enhancement for multi-scale object detection. Neural Process Lett 51:1907–1919
Article Google Scholar
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 528–537
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 850–859
Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2268–2277
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) CoupleNet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 4146–4154

Download references

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China under Grant 61976231, Grant U1611461, Grant 61573387, and Grant 61172141, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515011869, and in part by the Science and Technology Program of Guangzhou under Grant 201803030029.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China
Zhiwei Yan, Huicheng Zheng & Ye Li
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, Guangzhou, China
Zhiwei Yan, Huicheng Zheng & Ye Li
Guangdong Province Key Laboratory of Information Security Technology, 135 West Xingang Road, Guangzhou, 510275, China
Zhiwei Yan, Huicheng Zheng & Ye Li
Healthcare Security Bureau of Shenzhen Municipality, Rongchao Tower, 4036 Jintian Road, Futian District, Shenzhen, 518038, China
Ye Li

Authors

Zhiwei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Huicheng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ye Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huicheng Zheng.

Ethics declarations

Conflict of Interests

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, Z., Zheng, H. & Li, Y. Detail injection with heterogeneous composite backbone network for object detection. Multimed Tools Appl 81, 11621–11637 (2022). https://doi.org/10.1007/s11042-022-12241-3

Download citation

Received: 19 December 2020
Revised: 22 November 2021
Accepted: 14 January 2022
Published: 18 February 2022
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11042-022-12241-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detail injection with heterogeneous composite backbone network for object detection

Abstract

Access this article

Similar content being viewed by others

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Feature-enhanced composite backbone network for object detection

Backbone Based Feature Enhancement for Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detail injection with heterogeneous composite backbone network for object detection

Abstract

Access this article

Similar content being viewed by others

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Feature-enhanced composite backbone network for object detection

Backbone Based Feature Enhancement for Object Detection

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation