Abstract
Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly \(94\%\) of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to \(82.3\%\) on the PASCAL VOC dataset with an input size of \(300\times 300\) and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves \(80.8\%\) mAP, which is \(5.8\%\) greater than the mAP of our baseline model with a fully pretrained backbone.
Similar content being viewed by others
References
Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2874–2883
Bjorck J, Gomes CP, Selman B, Weinberger KQ (2018) Understanding batch normalization. In: Advances in neural information processing systems. pp 7705–7716
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162
Chen C, Ling Q (2019) Adaptive convolution for object detection. IEEE Trans Multimedia 21(12):3205–3217
Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence. pp 231–238
Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967
Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042
Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379–387
Deng J, Dong W, Socher R, Li L, Fei LF (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 248–255
Ding H, Jiang X, Shuai B, Liu AQ, Wang G (2018) Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2393–2402
Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 10519–10528
Dvornik N, Shmelkov K, Mairal J, Schmid C (2017) Blitznet: a real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision. pp 4154–4162
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587
Gong T, Liu B, Chu Q, Yu N (2019) Using multi-label classification to improve object detection. Neurocomputing 370:174–185
Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE and pattern recognition. pp 12595–12604
Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 447–456
He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision. pp 4918–4927
He K, Gkioxari G, Dollár PRG (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp 1026–1034
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of the European conference on computer vision. Springer, pp 340–353
Hong C, Yu J, Zhang J, Jin X, Lee KH (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961
Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2261–2269
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pp 448–456
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5244–5252
Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision. pp 1–17
Li J, Liang X, Li J, Wei Y, Xu T, Feng J, Yan S (2018) Multistage object detection with group recursive learning. IEEE Trans Multimedia 20(7):1645–1655
Li S, Yang L, Huang J, Hua X, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision. pp 6609–6618
Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218
Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision. pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 936–944
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision. pp 1–16
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: Single shot multibox detector. In: Proceedings of the European conference on computer vision. pp 21–37
Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations
Pang Y, Wang T, Anwer RM, Khan FS, Shao L (2019) Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. pp 1–6. arXiv preprint arXiv:1804.02767
Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision. pp. 1919–1927
Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42:398–412
Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 761–769
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the international conference on learning representations. pp 1–14
Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection. IEEE Trans Image Process 28(10):5041–5051
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence. pp 4278–4284
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826
Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: Proceedings of the IEEE international conference on multimedia and expo. pp 1–6
Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, Zhang Y (2020) Nas-fcos: Fast neural architecture search for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 11943–11951
Woo S, Hwang S, Kweon IS (2018) StairNet: Top-down semantic aggregation for accurate one shot detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp 1093–1102
Wu Y, He K (2018) Group normalization. In: Proceedings of the European conference on computer vision. pp 3–19
Yang D, Zou Y, Zhang J, Li G (2019) C-rpns: promoting object detection in real world via a cascade structure of region proposal networks. Neurocomputing 367:20–30
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision. pp 9657–9666
Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. In: IEEE transactions on pattern analysis and machine intelligence pp 1–14
Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2020) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674
Zhang H, Wang K, Tian Y, Gou C, Wang F (2018) MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. IIEEE Trans Veh Technol 67(9):8019–8030
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4203–4212
Zhang T, Hao L, Guo G (2019) A feature enriching object detection framework with weak segmentation loss. Neurocomputing 335:72–80
Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Zhao H, Zhiwei L, Lufa F, Tianqi Z (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806
Zheng H, Chen J, Chen L, Yan Z (2020) Feature enhancement for multi-scale object detection. Neural Process Lett 51:1907–1919
Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 528–537
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 850–859
Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2268–2277
Acknowledgements
This work was supported in part by the National Natural Science Foundation of China under Grant 61976231, Grant U1611461, Grant 61573387, and Grant 61172141, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515011869, and in part by the Science and Technology Program of Guangzhou under Grant 201803030029.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yan, Z., Zheng, H., Li, Y. et al. Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection. Neural Process Lett 53, 1921–1943 (2021). https://doi.org/10.1007/s11063-021-10493-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-021-10493-y