A fast SSD model based on parameter reduction and dilated convolution

Zhang, Xinliang; Xie, Heng; Zhao, Yunji; Qian, Wei; Xu, Xiaozhuo

doi:10.1007/s11554-021-01108-9

A fast SSD model based on parameter reduction and dilated convolution

Original Research Paper
Published: 25 April 2021

Volume 18, pages 2211–2224, (2021)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Xinliang Zhang ORCID: orcid.org/0000-0003-0467-8946¹,
Heng Xie¹,
Yunji Zhao¹,
Wei Qian¹ &
…
Xiaozhuo Xu¹

310 Accesses
8 Citations
Explore all metrics

Abstract

Deep learning networks always compromise between speed and accuracy for their in-depth feature extraction. In this paper, we present a modified single shot multibox detector (SSD) model to achieve high speed while maintaining satisfactory accuracy for target detection. Firstly, the operational parameters are reduced by deleting the convolution layers and reducing the channels within. Thus, the parameters are reduced by 50% with a permissible precision loss, and the detection speed of the model is significantly improved. Secondly, a light multiple dilated convolution (LMDC) operator is introduced to compensate for the precision loss. The LMDC functions as a filter to extract global and semantic information from the feature map, thereby making feature information completer and more accurate. Moreover, to reduce the computation quantity and increase the computation efficiency of the network, the feature extraction and fusion of the convolution layer are separated. It transforms the complex multiplication into addition among the parameters. Finally, the LMDC-SSD is evaluated on 3 datasets for 300 × 300-sized inputs. It yields 98.99% mean average precision (mAP) and 85 frames per second for the apple datasets. The speed and accuracy are improved by 44% and 8.1%, respectively, compared to the original model. The speed and accuracy are improved by 0.99% and 65.71%, respectively, for the bicycle and person datasets.The speed and accuracy are improved by 0.26% and 112.9%, respectively, for the vehicle datasets. The experimental results have shown that the proposed LMDC-SSD is rather promising for detection with high detection speed and accuracy performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

L-SSD: lightweight SSD target detection based on depth-separable convolution

Article 16 February 2024

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

Article 11 May 2022

R-SSD: refined single shot multibox detector for pedestrian detection

Article 14 January 2022

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Huang, Z., Huang, L., Gong, Y., Huang C., Wang, X.: Mask scoring R-CNN. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 6402–6411 (2019). https://doi.org/10.1109/CVPR.2019.00657
Liu, W., Anguelov, D., Erhan, D. et al.: SSD: single shot multiBox detector[C]. In: Proceedings of the 14th European Conference on Computer Vision. Springer, Amsterdam, pp. 21–27 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection[C]. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Gkioxari, G., Dollar, P., et al.: Mask R-CNN[C]. In: International conference on computer vision, pp. 2980–2988 (2017)
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Fu, C., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv: Computer Vision and Pattern Recognition (2017)
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv: Computer Vision and Pattern Recognition (2017)
Lane, N. D. et al.: DeepX: A software accelerator for low-power deep learning inference on mobile devices. In International Conference on Information Processing in Sensor Networks (IPSN), pp. 112 (2016)
Liu, G., Wang, C.: A novel multi-scale feature fusion method for region proposal network in fast object detection. Int J Data Warehousing Min (IJDWM) 16(3), 132 (2020)
Article Google Scholar
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. Fiber 56(4), 37 (2016)
Google Scholar
Denton, E., Zaremba, W., Bruna, J., et al.: Exploiting linear structure within convolutional networks for efficient evaluation. arXiv preprint arXiv:1404.0736 (2014)
Wang, R. J., Li, X., Ling, C. X.: Pelee: a real-time object detection system on mobile devices. arXiv preprint arXiv:1804.06882 (2018)
Schuster, R., Wasenmüller, O., Unger, C., et al.: SDC—stacked dilated convolution: a unified descriptor network for dense matching tasks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. In: arXiv: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2486–2495 (2018)
Zhang, G. L., Ge, L. L., Yang, Y. N., et al.: Fused confidence for scene text detection via intersection-over-union. In: 2019 IEEE 19th International Conference on Communication Technology (ICCT). IEEE (2019)
Santurkar, S., Tsipras, D., Ilyas, A., et al.: How does batch normalization help optimization? arXiv preprint arXiv:1805.11604 (2018)
Revaud, J., Almazan, J., Rezende, R., Souza, C. D.: Learning with average precision: training image retrieval with a listwise loss. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Long, X., Hu, S., Hu, Y., et al.: An FPGA-based ultra-high-speed object detection algorithm with multi-frame information fusion. Sensors 19(17), 3707 (2019)
Article Google Scholar

Download references

Acknowledgements

This research was partially supported by Scientific and Technological Research Projects in Henan Province(212102210244), Foundation of Henan Educational Committee (21A120004), Zhongyuan high level talents special support plan (ZYQR201912031), and the Fundamental Research Funds for the Universities of Henan Province (NSFRF170501).

Author information

Authors and Affiliations

School of Electrical Engineering and Automation, Henan Polytechnic University, 2001 Century Avenue, Jiaozuo, China
Xinliang Zhang, Heng Xie, Yunji Zhao, Wei Qian & Xiaozhuo Xu

Authors

Xinliang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Heng Xie
View author publications
You can also search for this author inPubMed Google Scholar
Yunji Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Wei Qian
View author publications
You can also search for this author inPubMed Google Scholar
Xiaozhuo Xu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xinliang Zhang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, X., Xie, H., Zhao, Y. et al. A fast SSD model based on parameter reduction and dilated convolution. J Real-Time Image Proc 18, 2211–2224 (2021). https://doi.org/10.1007/s11554-021-01108-9

Download citation

Received: 22 August 2020
Accepted: 13 April 2021
Published: 25 April 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s11554-021-01108-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A fast SSD model based on parameter reduction and dilated convolution

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

L-SSD: lightweight SSD target detection based on depth-separable convolution

InceptionDepth-wiseYOLOv2: improved implementation of YOLO framework for pedestrian detection

R-SSD: refined single shot multibox detector for pedestrian detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now