Skip to main content
Log in

Research on improved algorithm of object detection based on feature pyramid

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To solve the low detection accuracy of SSD for the small size object, this paper proposed an improved algorithm of SSD object detection based on the feature pyramid (FP-SSD). In the deep convolutional neural network, the high-level features contain well semantic information but are not sensitive to the translations. The low-level features have high resolutions but could not represent the features well. The feature pyramid structure contains multi-scale features. To combine the high and low-level features of the pyramid, the algorithm of this paper applied the deconvolution network to the high-level features of the feature pyramid to get the semantic information, dilated convolution network to learn the position information of the low-level features and used convolution for the middle level features to reduce the feature channels, then used convolution to fuse the features. After using the algorithm, a multi-scale detection structure is constructed. FP-SSD achieves a mean accuracy of 79% on PASCAL VOC2007, and 47% on MSCOCO, which has a great improve compared with SSD. We compared the detection accuracy and results with all kinds of scales by experiments, compared with SSD, the accuracy of FP-SSD is higher, which has more accurate location and higher recognition confidence.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Everingham M, Gool LV, Williams CKI et al (2010) ThePascal, Visual Object Classes (VOC) Challenge[J]. Int J Comput Vis 88(2):303–338

    Article  Google Scholar 

  2. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169

  3. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 580–587. https://doi.org/10.1109/CVPR.2014.81

  4. He K, Zhang X, Ren S et al (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  5. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778

  6. Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors[C]// European conference on computer vision. p. 7574;340–353

  7. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, Computer Science, pp. 448–456)

  8. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, ... Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, Champions, p 740–755

  9. Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017, July). Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944

  10. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, p. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2

  11. Nair V, Hinton GE (2010) Rectified Linear Units Improve Restricted Boltzmann Machines.[J]. Proc Icml:807–814

  12. Redmon J, Divvala S, Girshick R et al (2016) You Only Look Once: Unified, Real-Time Object Detection[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Comput Soc:779–788

  13. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems, Montreal, pp 91–99

  14. Russakovsky O, Deng J, Su H et al (2015) ImageNet Large Scale Visual Recognition Challenge[J]. Int J Comput Vis 115(3):211–252

    Article  MathSciNet  Google Scholar 

  15. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, pp. 580–587

  16. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceeding of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284

  17. Uijlings JRR, Sande KEAV (2013) D, Gevers T, et al. Selective Search for Object Recognition[J]. Int J Comput Vis 104(2):154–171

    Article  Google Scholar 

  18. Zhou Q (2018) Multi-layer affective computing model based on emotional psychology[J]. Electron Commer Res 18(1):109–124. https://doi.org/10.1007/s10660-017-9265-8

    Article  Google Scholar 

Download references

Acknowledgments

This work is partially supported by Shanxi Science Foundation (No.2015011045). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pinle Qin.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Qin, P., Li, C., Chen, J. et al. Research on improved algorithm of object detection based on feature pyramid. Multimed Tools Appl 78, 913–927 (2019). https://doi.org/10.1007/s11042-018-5870-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-5870-3

Keywords

Navigation