Abstract
To solve the low detection accuracy of SSD for the small size object, this paper proposed an improved algorithm of SSD object detection based on the feature pyramid (FP-SSD). In the deep convolutional neural network, the high-level features contain well semantic information but are not sensitive to the translations. The low-level features have high resolutions but could not represent the features well. The feature pyramid structure contains multi-scale features. To combine the high and low-level features of the pyramid, the algorithm of this paper applied the deconvolution network to the high-level features of the feature pyramid to get the semantic information, dilated convolution network to learn the position information of the low-level features and used convolution for the middle level features to reduce the feature channels, then used convolution to fuse the features. After using the algorithm, a multi-scale detection structure is constructed. FP-SSD achieves a mean accuracy of 79% on PASCAL VOC2007, and 47% on MSCOCO, which has a great improve compared with SSD. We compared the detection accuracy and results with all kinds of scales by experiments, compared with SSD, the accuracy of FP-SSD is higher, which has more accurate location and higher recognition confidence.







Similar content being viewed by others
References
Everingham M, Gool LV, Williams CKI et al (2010) ThePascal, Visual Object Classes (VOC) Challenge[J]. Int J Comput Vis 88(2):303–338
Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, p 580–587. https://doi.org/10.1109/CVPR.2014.81
He K, Zhang X, Ren S et al (2014) Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition[J]. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors[C]// European conference on computer vision. p. 7574;340–353
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift, Computer Science, pp. 448–456)
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, ... Zitnick CL (2014) Microsoft coco: Common objects in context. In: European conference on computer vision. Springer, Champions, p 740–755
Lin, T. Y., Dollár, P., Girshick, R., He, K., Hariharan, B., & Belongie, S. (2017, July). Feature pyramid networks for object detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 936–944
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: Single shot multibox detector. European conference on computer vision, p. 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Nair V, Hinton GE (2010) Rectified Linear Units Improve Restricted Boltzmann Machines.[J]. Proc Icml:807–814
Redmon J, Divvala S, Girshick R et al (2016) You Only Look Once: Unified, Real-Time Object Detection[C]// IEEE Conference on Computer Vision and Pattern Recognition. IEEE Comput Soc:779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Neural Information Processing Systems, Montreal, pp 91–99
Russakovsky O, Deng J, Su H et al (2015) ImageNet Large Scale Visual Recognition Challenge[J]. Int J Comput Vis 115(3):211–252
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. In 3rd International Conference on Learning Representations, pp. 580–587
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017). Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceeding of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4278–4284
Uijlings JRR, Sande KEAV (2013) D, Gevers T, et al. Selective Search for Object Recognition[J]. Int J Comput Vis 104(2):154–171
Zhou Q (2018) Multi-layer affective computing model based on emotional psychology[J]. Electron Commer Res 18(1):109–124. https://doi.org/10.1007/s10660-017-9265-8
Acknowledgments
This work is partially supported by Shanxi Science Foundation (No.2015011045). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the presentation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Qin, P., Li, C., Chen, J. et al. Research on improved algorithm of object detection based on feature pyramid. Multimed Tools Appl 78, 913–927 (2019). https://doi.org/10.1007/s11042-018-5870-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-5870-3