Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Yan, Zhiwei; Zheng, Huicheng; Li, Ye; Chen, Lvran

doi:10.1007/s11063-021-10493-y

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Published: 20 March 2021

Volume 53, pages 1921–1943, (2021)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Zhiwei Yan^1,2,3,
Huicheng Zheng ORCID: orcid.org/0000-0002-6729-4176^1,2,3,
Ye Li^1,4 &
…
Lvran Chen¹

535 Accesses
6 Citations
Explore all metrics

Abstract

Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly \(94\%\) of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to \(82.3\%\) on the PASCAL VOC dataset with an input size of \(300\times 300\) and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves \(80.8\%\) mAP, which is \(5.8\%\) greater than the mAP of our baseline model with a fully pretrained backbone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detail injection with heterogeneous composite backbone network for object detection

Article 18 February 2022

Exploring Context Information for Accurate and Fast Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

References

Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2874–2883
Bjorck J, Gomes CP, Selman B, Weinberger KQ (2018) Understanding batch normalization. In: Advances in neural information processing systems. pp 7705–7716
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162
Chen C, Ling Q (2019) Adaptive convolution for object detection. IEEE Trans Multimedia 21(12):3205–3217
Article Google Scholar
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence. pp 231–238
Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967
Article Google Scholar
Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042
Article Google Scholar
Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379–387
Deng J, Dong W, Socher R, Li L, Fei LF (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 248–255
Ding H, Jiang X, Shuai B, Liu AQ, Wang G (2018) Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2393–2402
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 10519–10528
Dvornik N, Shmelkov K, Mairal J, Schmid C (2017) Blitznet: a real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision. pp 4154–4162
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587
Gong T, Liu B, Chu Q, Yu N (2019) Using multi-label classification to improve object detection. Neurocomputing 370:174–185
Article Google Scholar
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE and pattern recognition. pp 12595–12604
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 447–456
He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision. pp 4918–4927
He K, Gkioxari G, Dollár PRG (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of the European conference on computer vision. Springer, pp 340–353
Hong C, Yu J, Zhang J, Jin X, Lee KH (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961
Article Google Scholar
Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2261–2269
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pp 448–456
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5244–5252
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision. pp 1–17
Li J, Liang X, Li J, Wei Y, Xu T, Feng J, Yan S (2018) Multistage object detection with group recursive learning. IEEE Trans Multimedia 20(7):1645–1655
Article Google Scholar
Li S, Yang L, Huang J, Hua X, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision. pp 6609–6618
Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218
Article Google Scholar
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision. pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 936–944
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision. pp 1–16
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: Single shot multibox detector. In: Proceedings of the European conference on computer vision. pp 21–37
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations
Pang Y, Wang T, Anwer RM, Khan FS, Shao L (2019) Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. pp 1–6. arXiv preprint arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision. pp. 1919–1927
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42:398–412
Article Google Scholar
Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 761–769
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the international conference on learning representations. pp 1–14
Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection. IEEE Trans Image Process 28(10):5041–5051
Article MathSciNet Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence. pp 4278–4284
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826
Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: Proceedings of the IEEE international conference on multimedia and expo. pp 1–6
Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, Zhang Y (2020) Nas-fcos: Fast neural architecture search for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 11943–11951
Woo S, Hwang S, Kweon IS (2018) StairNet: Top-down semantic aggregation for accurate one shot detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp 1093–1102
Wu Y, He K (2018) Group normalization. In: Proceedings of the European conference on computer vision. pp 3–19
Yang D, Zou Y, Zhang J, Li G (2019) C-rpns: promoting object detection in real world via a cascade structure of region proposal networks. Neurocomputing 367:20–30
Article Google Scholar
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision. pp 9657–9666
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. In: IEEE transactions on pattern analysis and machine intelligence pp 1–14
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Article MathSciNet Google Scholar
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2020) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674
Article Google Scholar
Zhang H, Wang K, Tian Y, Gou C, Wang F (2018) MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. IIEEE Trans Veh Technol 67(9):8019–8030
Article Google Scholar
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4203–4212
Zhang T, Hao L, Guo G (2019) A feature enriching object detection framework with weak segmentation loss. Neurocomputing 335:72–80
Article Google Scholar
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao H, Zhiwei L, Lufa F, Tianqi Z (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806
Article Google Scholar
Zheng H, Chen J, Chen L, Yan Z (2020) Feature enhancement for multi-scale object detection. Neural Process Lett 51:1907–1919
Article Google Scholar
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 528–537
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 850–859
Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2268–2277

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61976231, Grant U1611461, Grant 61573387, and Grant 61172141, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515011869, and in part by the Science and Technology Program of Guangzhou under Grant 201803030029.

Author information

Authors and Affiliations

School of Computer Science and Engineering, Sun Yat-sen University, 135 West Xingang Road, Guangzhou, 510275, China
Zhiwei Yan, Huicheng Zheng, Ye Li & Lvran Chen
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, 135 West Xingang Road, Guangzhou, 510275, China
Zhiwei Yan & Huicheng Zheng
Guangdong Province Key Laboratory of Information Security Technology, 135 West Xingang Road, Guangzhou, 510275, China
Zhiwei Yan & Huicheng Zheng
Healthcare Security Bureau of Shenzhen Municipality, Rongchao Tower, 4036 Jintian Road, Futian District, Shenzhen, 518038, China
Ye Li

Authors

Zhiwei Yan
View author publications
You can also search for this author in PubMed Google Scholar
Huicheng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Ye Li
View author publications
You can also search for this author in PubMed Google Scholar
Lvran Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huicheng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yan, Z., Zheng, H., Li, Y. et al. Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection. Neural Process Lett 53, 1921–1943 (2021). https://doi.org/10.1007/s11063-021-10493-y

Download citation

Accepted: 11 March 2021
Published: 20 March 2021
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11063-021-10493-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Abstract

Access this article

Similar content being viewed by others

Detail injection with heterogeneous composite backbone network for object detection

Exploring Context Information for Accurate and Fast Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

Abstract

Access this article

Similar content being viewed by others

Detail injection with heterogeneous composite backbone network for object detection

Exploring Context Information for Accurate and Fast Object Detection

A Local Top-Down Module for Object Detection with Multi-scale Features

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation