An enhanced SSD with feature fusion and visual reasoning for object detection

Leng, Jiaxu; Liu, Ying

doi:10.1007/s00521-018-3486-1

An enhanced SSD with feature fusion and visual reasoning for object detection

Original Article
Published: 19 April 2018

Volume 31, pages 6549–6558, (2019)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

1666 Accesses
60 Citations
Explore all metrics

Abstract

Single Shot Multibox Detector (SSD) is one of the top performing object detection algorithms in terms of both accuracy and speed. SSD achieves impressive performance on various datasets by using different output layers for object detection. However, each layer in the feature pyramid is used independently, and SSD considers only the fine-grained details of the objects but ignores the context surrounding objects. In this paper, we proposed an enhanced SSD, called ESSD, that improved the performance of the conventional SSD by fusing feature maps of different output layers, instead of growing layers close to the input data. Our method used two-way transfer of feature information and feature fusion to enhance the network. To assist further with object detection, we proposed a visual reasoning method that utilized fully the relationships between objects instead of using only the features of the objects themselves. This addition of visual reasoning proved very effective for detecting objects that are too small or have small features. To evaluate the proposed ESSD, we trained the model with VOC2007 and VOC2012 training sets and evaluated the performance on the Pascal VOC2007 test set. For $300 \times 300$ input, ESSD achieved 79.2% mean average precision (mAP) at 52.0 frames per second (FPS), and for $512 \times 512$ input, this approach achieved 82.4% mAP at 18.6 FPS. These results demonstrated that our proposed method can achieve state-of-the-art mAP, which is a better result than provided by the conventional SSD and other advanced detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Balanced Feature Fusion SSD for Object Detection

Article 14 March 2020

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

FasterNet-SSD: a small object detection method based on SSD model

Article 25 August 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate cnn object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2129–2137
Dai J, Li Y, He K, et al (2016) R-fcn: object detection via region-based fully convolutional networks. Adv Neural Inf Process. Syst, pp 379–387
Girshick R, Donahue J, Darrell T, et al (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Bell S, Lawrence Zitnick C, Bala K, et al (2016) Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883
Fukui A, Park D H, Yang D, et al (2016) Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847
Kong T, Yao A, Chen Y, et al (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Liu W, Anguelov D, Erhan D et al (2016) Ssd: single shot multibox detector[C]. In: European conference on computer vision. Springer, Cham, pp 21–37
Gao Y, Beijbom O, Zhang N, et al (2016) Compact bilinear pooling. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 317–326
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: CVPR 2005: IEEE computer society conference on computer vision and pattern recognition, 2005, vol 1. IEEE, pp 886–893
Erhan D, Szegedy C, Toshev A, et al (2014) Scalable object detection using deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2147–2154
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Pinheiro PO, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Proceedings of the 28th international conference on neural information processing systems (NIPS’15), Montreal, 7–12 December 2015, pp 1990–1998
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Proceedings of the 25th international conference on neural information processing systems (NIPS’12), Lake Tahoe, 3–6 December 2012, pp 1097–1105
Zhang H, Cao X, Ho JKL et al (2017) Object-level video advertising: an optimization framework. IEEE Trans Industr Inf 13(2):520–531
Article Google Scholar
Girshick RB, Donahue J, Darrell T et al (2016) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38(1):142–158
Article Google Scholar
Uijlings JR, De Sande KE, Gevers T et al (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
He K, Zhang X, Ren S, et al (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision, pp 346–361
Girshick RB (2015) Fast R-CNN. In: International conference on computer vision, pp 1440–1448
Zitnick CL, Dollár P (2014) Edge boxes: locating object proposals from edges. In: European conference on computer vision. Springer, Cham, pp 391–405
Ren S, He K, Girshick RB et al (2017) Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39(6):1137–1149
Article Google Scholar
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, Las Vegas, 27–30 June 2016, pp 779–788
Redmon J, Farhadi A (2016) YOLO9000: better, faster, stronger. arXiv preprint, p 1612
Fu CY, Liu W, Ranga A, et al (2017) DSSD: deconvolutional single shot detector. arXiv preprint arXiv:1701.06659
Everingham M, Van Gool L, Williams CKI et al (2010) The Pascal visual object classes (voc) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

Download references

Acknowledgements

This project was partially supported by Grants from Natural Science Foundation of 353 China 71671178, 9154620, and 61202321, and the open project of the Key Lab of Big Data 354 Mining and Knowledge Management. It was also supported by Hainan Provincial 355 Department of Science and Technology under Grant No. ZDKJ2016021, and by 356 Guangdong Provincial Science and Technology Project 2016B010127004.

Author information

Authors and Affiliations

School of Computer and Control Engineering, University of Chinese Academy of Sciences, The Campus of Yanqi, Beijing, 101400, China
Jiaxu Leng & Ying Liu

Authors

Jiaxu Leng
View author publications
You can also search for this author inPubMed Google Scholar
Ying Liu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Jiaxu Leng.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Leng, J., Liu, Y. An enhanced SSD with feature fusion and visual reasoning for object detection. Neural Comput & Applic 31, 6549–6558 (2019). https://doi.org/10.1007/s00521-018-3486-1

Download citation

Received: 15 February 2018
Accepted: 11 April 2018
Published: 19 April 2018
Issue Date: October 2019
DOI: https://doi.org/10.1007/s00521-018-3486-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An enhanced SSD with feature fusion and visual reasoning for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Balanced Feature Fusion SSD for Object Detection

Multi-scale Feature Fusion Single Shot Object Detector Based on DenseNet

FasterNet-SSD: a small object detection method based on SSD model

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now