Single-shot augmentation detector for object detection

Leng, Jiaxu; Liu, Ying

doi:10.1007/s00521-020-05202-0

Single-shot augmentation detector for object detection

Original Article
Published: 29 July 2020

Volume 33, pages 3583–3596, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

498 Accesses
5 Citations
Explore all metrics

Abstract

Single-shot multibox detector (SSD), one of the top-performing object detection algorithms, has achieved both high accuracy and fast speed. However, its performance is limited by two factors: (1) anchors are generated uniformly over the image by predefined manners, and (2) multiscale features from the feature pyramid are used to detect objects independently. In this paper, we propose a single-shot augmentation detector, called SSADet, that significantly improves the detection accuracy of the original SSD with a slight decrease in speed. SSADet mainly consists of two modules, namely the anchor prediction module and the feature fusion module. These two modules aim to generate anchors with any scale and aspect ratio and fuse multiscale features from different layers, respectively. Specifically, we define an anchor generator whose parameter weights are predicted dynamically by a small neural network and then use the anchor generator to generate optimal anchors over the image in the anchor prediction module. In the feature fusion module, multiscale features from the feature pyramid are concatenated to generate a new feature pyramid through a set of downsampling and upsampling operations. The new feature pyramid takes the generated anchors as the input from the anchor prediction module to predict the final detection results. Extensive experiments are conducted to demonstrate the effectiveness of SSADet on the PASCAL VOC 2007, PASCAL VOC 2012 and MS COCO detection datasets. The experimental results show that SSADet achieves state-of-the-art detection performance with high efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single shot object detection with refined feature

Article 04 September 2020

Efficient Single Shot Object Detector Towards More Accurate and Faster Prediction

An enhanced SSD with feature fusion and visual reasoning for object detection

Article 19 April 2018

References

Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2874–2883
Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: IEEE CVPR
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In Computer vision (ICCV), 2017 IEEE International conference on IEEE, pp. 2980–2988
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 6:1137–1149
Article Google Scholar
Hu R, Dollár P, He K, Darrell T, Girshick R (2018) Learning to segment every thing. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4233–4241
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 2017
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp. 21–37
Redmon J, Farhadi A (2018) Yolov3: an incremental improvement. arXiv preprint arXiv:1804.02767
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4203–4212
Woo S, Hwang S, Kweon IS (2018) Stairnet: top-down semantic aggregation for accurate one shot detection. In: 2018 IEEE Winter conference on applications of computer vision (WACV). IEEE, pp. 1093–1102
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vision 88(2):303–338
Article Google Scholar
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, Berlin pp. 740–755
Leng J, Liu Y (2018) An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput Appl 31:6549
Article Google Scholar
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 7263–7271
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp. 630–645
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, pp. 391–405
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
Article Google Scholar
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp. 1440–1448
Chen X, Gupta A (2017) Spatial memory for context reasoning in object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4086–4096
Chen Z, Huang S, Tao D (2018) Context refinement for object detection. In: The European conference on computer vision (ECCV)
Li J, Wei Y, Liang X, Dong J, Xu T, Feng J, Yan S (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954
Article Google Scholar
Tang X, Du DK, He Z, Liu J (2018) Pyramidbox: a context-assisted single shot face detector. In: Proceedings of the European conference on computer vision (ECCV), pp. 797–813
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 761–769
Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2606–2615
Chen Y, Li W, Sakaridis C, Dai D, Van Gool L (2018) Domain adaptive faster r-cnn for object detection in the wild. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3339–3348
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks In: Advances in neural information processing systems, pp. 379–387
Lee H, Eum S, Kwon H (2017) Me r-cnn: Multi-expert r-cnn for object detection. arXiv preprint arXiv:1704.01069
Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2017) Light-head r-cnn: In defense of two-stage object detector. arXiv preprint arXiv:1711.07264
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 4126–4134
Hara K, Liu MY, Tuzel O, Farahmand A-m (2017) Attentional network for visual object detection. arXiv preprint arXiv:1702.01478
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th International conference on pattern recognition (ICPR’06), vol. 3.IEEE, 2006, pp. 850–855
Shen Z, Liu Z, Li J, Jiang YG, Chen Y, Xue X (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of the IEEE international conference on computer vision, pp. 1919–1927
Iandola F, Moskewicz M, Karayev S, Girshick R, Darrell T, Keutzer K (2014) Densenet: implementing efficient convnet descriptor pyramids. arXiv preprint arXiv:1404.1869
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, pp. 9259–9266
Kong T, Sun F, Tan C, Liu H, Huang W (2018) Deep feature pyramid reconfiguration for object detection. In: Proceedings of the European conference on computer vision (ECCV), 2018, pp. 169–185
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 845–853
Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125
Jeong J, Park H, Kwak N (2017) Enhancement of SSD by concatenating feature maps for object detection.
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 821–830
Yang T, Zhang X, Li Z, Zhang W, Sun J (2018) Metaanchor: learning to detect objects with customized anchors. In: Advances in neural information processing systems, 2018, pp. 318–328
Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. arXiv preprint arXiv:1901.03278 2019
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. arXiv preprint arXiv:1903.00621
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167
Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL (2014) Semantic image segmentation with deep convolutional nets and fully connected CRFS. arXiv preprint arXiv:1412.7062
Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware CNN model. In Proceedings of the IEEE international conference on computer vision, pp. 1134–1142
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 5936–5944
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp. 764–773
Tychsen-Smith L, Petersson L (2018) Improving object localization with fitness NMS and bounded IOU loss. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6877–6885
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp. 2980–2988

Download references

Acknowledgements

This project was partially supported by Grants from Natural Science Foundation of China 71671178, 9154620 and 61202321 and the Open Project of the Key Lab of Big Data Mining and Knowledge Management. It was also supported by Hainan Provincial Department of Science and Technology under Grant No. ZDKJ2016021 and by Guangdong Provincial Science and Technology Project 2016B010127004.

Author information

Authors and Affiliations

College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
Jiaxu Leng
School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Ying Liu

Authors

Jiaxu Leng
View author publications
You can also search for this author in PubMed Google Scholar
Ying Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaxu Leng.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leng, J., Liu, Y. Single-shot augmentation detector for object detection. Neural Comput & Applic 33, 3583–3596 (2021). https://doi.org/10.1007/s00521-020-05202-0

Download citation

Received: 16 April 2020
Accepted: 11 July 2020
Published: 29 July 2020
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00521-020-05202-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Single-shot augmentation detector for object detection

Abstract

Access this article

Similar content being viewed by others

Single shot object detection with refined feature

Efficient Single Shot Object Detector Towards More Accurate and Faster Prediction

An enhanced SSD with feature fusion and visual reasoning for object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Single-shot augmentation detector for object detection

Abstract

Access this article

Similar content being viewed by others

Single shot object detection with refined feature

Efficient Single Shot Object Detector Towards More Accurate and Faster Prediction

An enhanced SSD with feature fusion and visual reasoning for object detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation