Skip to main content
Log in

Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

Current detection networks usually struggle to detect small-scale object instances due to spatial information loss and lack of semantics. In this paper, we propose a one-stage detector named LocalNet, which pays specific attention to the detailed information modeling. LocalNet is built upon our redesigned detection-oriented backbone called long neck ResNet, which aims to preserve more detailed information in the early stage to enhance the representation of small objects. Furthermore, to enhance the semantics in the detection layers, we propose a local detail-context module, which reintroduces the detailed information lost in the network and exploits the local context within a restricted receptive field range. Moreover, we explore a method for training detectors nearly or totally from scratch, which provides the potential to design network structures with more freedom. With nearly \(94\%\) of the pretrained parameters randomly reinitialized in the backbone, our model improves the mAP of our baseline model from 75.0 to \(82.3\%\) on the PASCAL VOC dataset with an input size of \(300\times 300\) and achieves state-of-the-art accuracy. Even when trained from scratch, our model achieves \(80.8\%\) mAP, which is \(5.8\%\) greater than the mAP of our baseline model with a fully pretrained backbone.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Bell S, Lawrence ZC, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2874–2883

  2. Bjorck J, Gomes CP, Selman B, Weinberger KQ (2018) Understanding batch normalization. In: Advances in neural information processing systems. pp 7705–7716

  3. Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 6154–6162

  4. Chen C, Ling Q (2019) Adaptive convolution for object detection. IEEE Trans Multimedia 21(12):3205–3217

    Article  Google Scholar 

  5. Chi C, Zhang S, Xing J, Lei Z, Li SZ, Zou X (2019) Selective refinement network for high performance face detection. In: Proceedings of the AAAI conference on artificial intelligence. pp 231–238

  6. Chu J, Guo Z, Leng L (2018) Object detection based on multi-layer convolution feature fusion and online hard example mining. IEEE Access 6:19959–19967

    Article  Google Scholar 

  7. Chu W, Cai D (2018) Deep feature based contextual model for object detection. Neurocomputing 275:1035–1042

    Article  Google Scholar 

  8. Dai J, Li Y, He K, Sun J (2016) R-FCN: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems. pp 379–387

  9. Deng J, Dong W, Socher R, Li L, Fei LF (2009) Imagenet: A large-scale hierarchical image database. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 248–255

  10. Ding H, Jiang X, Shuai B, Liu AQ, Wang G (2018) Context contrasted feature and gated multi-scale aggregation for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2393–2402

  11. Dong Z, Li G, Liao Y, Wang F, Ren P, Qian C (2020) Centripetalnet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 10519–10528

  12. Dvornik N, Shmelkov K, Mairal J, Schmid C (2017) Blitznet: a real-time deep network for scene understanding. In: Proceedings of the IEEE international conference on computer vision. pp 4154–4162

  13. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  14. Fu C, Liu W, Ranga A, Tyagi A, Berg A (2017) DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659

  15. Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 1440–1448

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 580–587

  17. Gong T, Liu B, Chu Q, Yu N (2019) Using multi-label classification to improve object detection. Neurocomputing 370:174–185

    Article  Google Scholar 

  18. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: Improving multi-scale feature learning for object detection. In: Proceedings of the IEEE and pattern recognition. pp 12595–12604

  19. Hariharan B, Arbeláez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 447–456

  20. He K, Girshick R, Dollár P (2019) Rethinking imagenet pre-training. In: Proceedings of the IEEE international conference on computer vision. pp 4918–4927

  21. He K, Gkioxari G, Dollár PRG (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

  22. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: Proceedings of the IEEE international conference on computer vision. pp 1026–1034

  23. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778

  24. Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of the European conference on computer vision. Springer, pp 340–353

  25. Hong C, Yu J, Zhang J, Jin X, Lee KH (2019) Multimodal face-pose estimation with multitask manifold deep learning. IEEE Trans Ind Inform 15(7):3952–3961

    Article  Google Scholar 

  26. Huang G, Liu Z, Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2261–2269

  27. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International conference on machine learning. pp 448–456

  28. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: Reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5244–5252

  29. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision. pp 1–17

  30. Li J, Liang X, Li J, Wei Y, Xu T, Feng J, Yan S (2018) Multistage object detection with group recursive learning. IEEE Trans Multimedia 20(7):1645–1655

    Article  Google Scholar 

  31. Li S, Yang L, Huang J, Hua X, Zhang L (2019) Dynamic anchor feature selection for single-shot object detection. In: Proceedings of the IEEE international conference on computer vision. pp 6609–6618

  32. Li Y, Zheng H, Yan Z, Chen L (2019) Detail preservation and feature refinement for object detection. Neurocomputing 359:209–218

    Article  Google Scholar 

  33. Lin T, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988

  34. Lin T, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick C (2014) Microsoft COCO: Common objects in context. In: Proceedings of the European conference on computer vision. pp 740–755

  35. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 936–944

  36. Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision. pp 1–16

  37. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C, Berg AC (2016) SSD: Single shot multibox detector. In: Proceedings of the European conference on computer vision. pp 21–37

  38. Loshchilov I, Hutter F (2017) SGDR: stochastic gradient descent with warm restarts. In: International conference on learning representations

  39. Pang Y, Wang T, Anwer RM, Khan FS, Shao L (2019) Efficient featurized image pyramid network for single shot detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  40. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 779–788

  41. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 7263–7271

  42. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. pp 1–6. arXiv preprint arXiv:1804.02767

  43. Ren S, He K, Girshick R, Sun J (2017) Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149

    Article  Google Scholar 

  44. Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2017) Dsod: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision. pp. 1919–1927

  45. Shen Z, Liu Z, Li J, Jiang Y, Chen Y, Xue X (2019) Object detection from scratch with deep supervision. IEEE Trans Pattern Anal Mach Intell 42:398–412

    Article  Google Scholar 

  46. Shen Z, Shi H, Feris R, Cao L, Yan S, Liu D, Wang X, Xue X, Huang TS (2017) Learning object detectors from scratch with gated recurrent feature pyramids. arXiv preprint arXiv:1712.00886

  47. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 761–769

  48. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the international conference on learning representations. pp 1–14

  49. Sun F, Kong T, Huang W, Tan C, Fang B, Liu H (2019) Feature pyramid reconfiguration with consistent loss for object detection. IEEE Trans Image Process 28(10):5041–5051

    Article  MathSciNet  Google Scholar 

  50. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence. pp 4278–4284

  51. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9

  52. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826

  53. Wang G, Xiong Z, Liu D, Luo C (2018) Cascade mask generation framework for fast small object detection. In: Proceedings of the IEEE international conference on multimedia and expo. pp 1–6

  54. Wang N, Gao Y, Chen H, Wang P, Tian Z, Shen C, Zhang Y (2020) Nas-fcos: Fast neural architecture search for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 11943–11951

  55. Woo S, Hwang S, Kweon IS (2018) StairNet: Top-down semantic aggregation for accurate one shot detection. In: Proceedings of the IEEE winter conference on applications of computer vision. pp 1093–1102

  56. Wu Y, He K (2018) Group normalization. In: Proceedings of the European conference on computer vision. pp 3–19

  57. Yang D, Zou Y, Zhang J, Li G (2019) C-rpns: promoting object detection in real world via a cascade structure of region proposal networks. Neurocomputing 367:20–30

    Article  Google Scholar 

  58. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: Proceedings of the IEEE international conference on computer vision. pp 9657–9666

  59. Yu J, Tan M, Zhang H, Rui Y, Tao D (2019) Hierarchical deep click feature prediction for fine-grained image recognition. In: IEEE transactions on pattern analysis and machine intelligence pp 1–14

  60. Yu J, Tao D, Wang M (2012) Adaptive hypergraph learning and its application in image classification. IEEE Trans Image Process 21(7):3262–3272

    Article  MathSciNet  Google Scholar 

  61. Yu J, Zhu C, Zhang J, Huang Q, Tao D (2020) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst 31(2):661–674

    Article  Google Scholar 

  62. Zhang H, Wang K, Tian Y, Gou C, Wang F (2018) MFR-CNN: Incorporating multi-scale features and global information for traffic object detection. IIEEE Trans Veh Technol 67(9):8019–8030

    Article  Google Scholar 

  63. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 4203–4212

  64. Zhang T, Hao L, Guo G (2019) A feature enriching object detection framework with weak segmentation loss. Neurocomputing 335:72–80

    Article  Google Scholar 

  65. Zhang Z, Qiao S, Xie C, Shen W, Wang B, Yuille AL (2018) Single-shot object detection with enriched semantics. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  66. Zhao H, Zhiwei L, Lufa F, Tianqi Z (2020) A balanced feature fusion SSD for object detection. Neural Process Lett 51:2789–2806

    Article  Google Scholar 

  67. Zheng H, Chen J, Chen L, Yan Z (2020) Feature enhancement for multi-scale object detection. Neural Process Lett 51:1907–1919

    Article  Google Scholar 

  68. Zhou P, Ni B, Geng C, Hu J, Xu Y (2018) Scale-transferrable object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 528–537

  69. Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 850–859

  70. Zhu R, Zhang S, Wang X, Wen L, Shi H, Bo L, Mei T (2019) Scratchdet: training single-shot object detectors from scratch. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2268–2277

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 61976231, Grant U1611461, Grant 61573387, and Grant 61172141, in part by the Guangdong Basic and Applied Basic Research Foundation under Grant 2019A1515011869, and in part by the Science and Technology Program of Guangzhou under Grant 201803030029.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huicheng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yan, Z., Zheng, H., Li, Y. et al. Detection-Oriented Backbone Trained from Near Scratch and Local Feature Refinement for Small Object Detection. Neural Process Lett 53, 1921–1943 (2021). https://doi.org/10.1007/s11063-021-10493-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-021-10493-y

Keywords

Navigation