Abstract
Single-shot multibox detector (SSD), one of the top-performing object detection algorithms, has achieved both high accuracy and fast speed. However, its performance is limited by two factors: (1) anchors are generated uniformly over the image by predefined manners, and (2) multiscale features from the feature pyramid are used to detect objects independently. In this paper, we propose a single-shot augmentation detector, called SSADet, that significantly improves the detection accuracy of the original SSD with a slight decrease in speed. SSADet mainly consists of two modules, namely the anchor prediction module and the feature fusion module. These two modules aim to generate anchors with any scale and aspect ratio and fuse multiscale features from different layers, respectively. Specifically, we define an anchor generator whose parameter weights are predicted dynamically by a small neural network and then use the anchor generator to generate optimal anchors over the image in the anchor prediction module. In the feature fusion module, multiscale features from the feature pyramid are concatenated to generate a new feature pyramid through a set of downsampling and upsampling operations. The new feature pyramid takes the generated anchors as the input from the anchor prediction module to predict the final detection results. Extensive experiments are conducted to demonstrate the effectiveness of SSADet on the PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO detection datasets. The experimental results show that SSADet achieves state-of-the-art detection performance with high efficiency.




Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: IEEE CVPR
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Computer vision (ICCV), 2017 IEEE International conference on IEEE, pp. 2980–2988
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 6:1137–1149
Hu R, Dollár P, He K, Darrell T, Girshick R (2018) Learning to segment every thing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4233–4241
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 2017
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp. 21–37
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212
Woo S, Hwang S, Kweon IS (2018) Stairnet: top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp. 1093–1102
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin pp. 740–755
Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 31:6549
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp. 630–645
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp. 391–405
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4086–4096
Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp. 797–813
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 761–769
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2606–2615
Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3339–3348
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks In: Advances in neural information processing systems, pp. 379–387
Lee H, Eum S, Kwon H (2017) Me r-cnn: Multi-expert r-cnn for object detection. arXiv preprint arXiv:1704.01069
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4126–4134
Hara K, Liu MY, Tuzel O, Farahmand A-m (2017) Attentional network for visual object detection. arXiv preprint arXiv:1702.01478
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International conference on pattern recognition (ICPR’06), vol. 3.IEEE, 2006, pp. 850–855
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp. 1919–1927
Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9259–9266
Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 169–185
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 845–853
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125
Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection.
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 821–830
Yang T, Zhang X, Li Z, Zhang W, Sun J (2018) Metaanchor: learning to detect objects with customized anchors. In: Advances in neural information processing systems, 2018, pp. 318–328
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. arXiv preprint arXiv:1901.03278 2019
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. arXiv preprint arXiv:1903.00621
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv preprint arXiv:1412.7062
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision, pp. 1134–1142
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5936–5944
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IOU loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6877–6885
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988
Acknowledgements
This project was partially supported by Grants from Natural Science Foundation of China 71671178, 9154620 and 61202321 and the Open Project of the Key Lab of Big Data Mining and Knowledge Management. It was also supported by Hainan Provincial Department of Science and Technology under Grant No. ZDKJ2016021 and by Guangdong Provincial Science and Technology Project 2016B010127004.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Leng, J., Liu, Y. Single-shot augmentation detector for object detection. Neural Comput & Applic 33, 3583–3596 (2021). https://doi.org/10.1007/s00521-020-05202-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-05202-0