Skip to main content

SFSSD: Shallow Feature Fusion Single Shot Multibox Detector

  • Conference paper
  • First Online:
Communications, Signal Processing, and Systems (CSPS 2019)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 571))

Abstract

The main contribution of this paper is an approach for introducing more context to improve the accuracy of the traditional SSD (Single Shot Multibox Detector), which is one of the top object detection algorithms in both aspects of accuracy and speed. We augment SSD with a multi-level feature fusion method at shallow layers for introducing contextual information to improve accuracy, especially for the detection of small objects, calling our resulting system SFSSD for shallow feature fusion single shot multibox detector. In the feature fusion module, features from different layers with different scales are concatenated together, followed by some down-sampling blocks to generate new feature pyramid which will be fed to multibox detectors to predict the final detection results. For the Pascal VOC2007 test set trained with VOC2007 and VOC2012 training sets, the proposed network with the input size of 300 \(\times \) 300 achieved 75.4 mAP (mean average precision), while the network with 512 \(\times \) 512 sized input achieved 79.7 mAP. Our SFSSD shows state-of-the-art mAP, which is better than those of the conventional SSD, Fast R-CNN, Faster-RCNN, ION and MR-CNN.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Adelson EH, Anderson CH, Bergen JR et al (1984) Pyramid methods in image processing. RCA Eng 29(6):33–41

    Google Scholar 

  2. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: International conference on computer vision pattern recognition (CVPR’05), vol 1. IEEE Computer Society

    Google Scholar 

  3. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

    Google Scholar 

  4. Ren S, He K, Girshick R et al (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

    Google Scholar 

  5. He K et al (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  6. Lin TY, Dollr P, Girshick R et al (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

    Google Scholar 

  7. Liu W et al (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, Cham

    Google Scholar 

  8. Cai Z et al (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, Cham

    Google Scholar 

  9. Li Z, Zhou F (2017) FSSD: feature fusion single shot multibox detector. arXiv preprint arXiv:1712.00960

  10. Pinheiro PO, Lin TY, Collobert R et al (2016) Learning to refine object segments. In: European conference on computer vision. Springer, Cham, pp 75–91

    Google Scholar 

  11. Liu W, Rabinovich A, Berg AC (2015) Parsenet: looking wider to see better. arXiv preprint arXiv:1506.04579

  12. Bell S, Lawrence Zitnick C, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

    Google Scholar 

  13. Yang T, Zhang X, Li Z, Zhang W, Sun J (2018) Metaanchor: learning to detect objects with customized anchors. In: Advances in Neural Information Processing Systems, pp 320–330

    Google Scholar 

  14. Badrinarayanan V, Kendall A, Cipolla R (2017) Segnet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell 39(12):2481–2495

    Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 61702073).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dafeng Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, D., Zhang, B., Cao, Y., Lu, M. (2020). SFSSD: Shallow Feature Fusion Single Shot Multibox Detector. In: Liang, Q., Wang, W., Liu, X., Na, Z., Jia, M., Zhang, B. (eds) Communications, Signal Processing, and Systems. CSPS 2019. Lecture Notes in Electrical Engineering, vol 571. Springer, Singapore. https://doi.org/10.1007/978-981-13-9409-6_316

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-9409-6_316

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-9408-9

  • Online ISBN: 978-981-13-9409-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics