Skip to main content
Log in

Single-shot augmentation detector for object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Single-shot multibox detector (SSD), one of the top-performing object detection algorithms, has achieved both high accuracy and fast speed. However, its performance is limited by two factors: (1) anchors are generated uniformly over the image by predefined manners, and (2) multiscale features from the feature pyramid are used to detect objects independently. In this paper, we propose a single-shot augmentation detector, called SSADet, that significantly improves the detection accuracy of the original SSD with a slight decrease in speed. SSADet mainly consists of two modules, namely the anchor prediction module and the feature fusion module. These two modules aim to generate anchors with any scale and aspect ratio and fuse multiscale features from different layers, respectively. Specifically, we define an anchor generator whose parameter weights are predicted dynamically by a small neural network and then use the anchor generator to generate optimal anchors over the image in the anchor prediction module. In the feature fusion module, multiscale features from the feature pyramid are concatenated to generate a new feature pyramid through a set of downsampling and upsampling operations. The new feature pyramid takes the generated anchors as the input from the anchor prediction module to predict the final detection results. Extensive experiments are conducted to demonstrate the effectiveness of SSADet on the PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO detection datasets. The experimental results show that SSADet achieves state-of-the-art detection performance with high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883

  2. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: IEEE CVPR

  3. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Computer vision (ICCV), 2017 IEEE International conference on IEEE, pp. 2980–2988

  4. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 6:1137–1149

    Article  Google Scholar 

  5. Hu R, Dollár P, He K, Darrell T, Girshick R (2018) Learning to segment every thing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4233–4241

  6. Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 2017

  7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp. 21–37

  8. Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767

  9. Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212

  10. Woo S, Hwang S, Kweon IS (2018) Stairnet: top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp. 1093–1102

  11. Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338

    Article  Google Scholar 

  12. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin pp. 740–755

  13. Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 31:6549

    Article  Google Scholar 

  14. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271

  15. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp. 630–645

  16. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556

  17. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587

  18. Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp. 391–405

  19. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171

    Article  Google Scholar 

  20. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  21. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448

  22. Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4086–4096

  23. Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)

  24. Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954

    Article  Google Scholar 

  25. Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp. 797–813

  26. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 761–769

  27. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2606–2615

  28. Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3339–3348

  29. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks In: Advances in neural information processing systems, pp. 379–387

  30. Lee H, Eum S, Kwon H (2017) Me r-cnn: Multi-expert r-cnn for object detection. arXiv preprint arXiv:1704.01069

  31. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264

  32. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4126–4134

  33. Hara K, Liu MY, Tuzel O, Farahmand A-m (2017) Attentional network for visual object detection. arXiv preprint arXiv:1702.01478

  34. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788

  35. Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International conference on pattern recognition (ICPR’06), vol. 3.IEEE, 2006, pp. 850–855

  36. Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp. 1919–1927

  37. Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869

  38. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9259–9266

  39. Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 169–185

  40. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 845–853

  41. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579

  42. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440

  43. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125

  44. Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection.

  45. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 821–830

  46. Yang T, Zhang X, Li Z, Zhang W, Sun J (2018) Metaanchor: learning to detect objects with customized anchors. In: Advances in neural information processing systems, 2018, pp. 318–328

  47. Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. arXiv preprint arXiv:1901.03278 2019

  48. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. arXiv preprint arXiv:1903.00621

  49. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167

  50. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv preprint arXiv:1412.7062

  51. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision, pp. 1134–1142

  52. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9

  53. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5936–5944

  54. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773

  55. Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IOU loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6877–6885

  56. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

Download references

Acknowledgements

This project was partially supported by Grants from Natural Science Foundation of China 71671178, 9154620 and 61202321 and the Open Project of the Key Lab of Big Data Mining and Knowledge Management. It was also supported by Hainan Provincial Department of Science and Technology under Grant No. ZDKJ2016021 and by Guangdong Provincial Science and Technology Project 2016B010127004.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaxu Leng.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Leng, J., Liu, Y. Single-shot augmentation detector for object detection. Neural Comput & Applic 33, 3583–3596 (2021). https://doi.org/10.1007/s00521-020-05202-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05202-0

Keywords

Navigation