Abstract
Deep learning networks always compromise between speed and accuracy for their in-depth feature extraction. In this paper, we present a modified single shot multibox detector (SSD) model to achieve high speed while maintaining satisfactory accuracy for target detection. Firstly, the operational parameters are reduced by deleting the convolution layers and reducing the channels within. Thus, the parameters are reduced by 50% with a permissible precision loss, and the detection speed of the model is significantly improved. Secondly, a light multiple dilated convolution (LMDC) operator is introduced to compensate for the precision loss. The LMDC functions as a filter to extract global and semantic information from the feature map, thereby making feature information completer and more accurate. Moreover, to reduce the computation quantity and increase the computation efficiency of the network, the feature extraction and fusion of the convolution layer are separated. It transforms the complex multiplication into addition among the parameters. Finally, the LMDC-SSD is evaluated on 3 datasets for 300 × 300-sized inputs. It yields 98.99% mean average precision (mAP) and 85 frames per second for the apple datasets. The speed and accuracy are improved by 44% and 8.1%, respectively, compared to the original model. The speed and accuracy are improved by 0.99% and 65.71%, respectively, for the bicycle and person datasets.The speed and accuracy are improved by 0.26% and 112.9%, respectively, for the vehicle datasets. The experimental results have shown that the proposed LMDC-SSD is rather promising for detection with high detection speed and accuracy performance.











Similar content being viewed by others
References
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, pp. 580–587 (2014). https://doi.org/10.1109/CVPR.2014.81
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Huang, Z., Huang, L., Gong, Y., Huang C., Wang, X.: Mask scoring R-CNN. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 6402–6411 (2019). https://doi.org/10.1109/CVPR.2019.00657
Liu, W., Anguelov, D., Erhan, D. et al.: SSD: single shot multiBox detector[C]. In: Proceedings of the 14th European Conference on Computer Vision. Springer, Amsterdam, pp. 21–27 (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection[C]. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
He, K., Gkioxari, G., Dollar, P., et al.: Mask R-CNN[C]. In: International conference on computer vision, pp. 2980–2988 (2017)
Girshick, R.: Fast R-CNN. In: 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Fu, C., Liu, W., Ranga, A., et al.: DSSD: deconvolutional single shot detector. arXiv: Computer Vision and Pattern Recognition (2017)
Li, Z., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv: Computer Vision and Pattern Recognition (2017)
Lane, N. D. et al.: DeepX: A software accelerator for low-power deep learning inference on mobile devices. In International Conference on Information Processing in Sensor Networks (IPSN), pp. 112 (2016)
Liu, G., Wang, C.: A novel multi-scale feature fusion method for region proposal network in fast object detection. Int J Data Warehousing Min (IJDWM) 16(3), 132 (2020)
Han, S., Mao, H., Dally, W.J.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. Fiber 56(4), 37 (2016)
Denton, E., Zaremba, W., Bruna, J., et al.: Exploiting linear structure within convolutional networks for efficient evaluation. arXiv preprint arXiv:1404.0736 (2014)
Wang, R. J., Li, X., Ling, C. X.: Pelee: a real-time object detection system on mobile devices. arXiv preprint arXiv:1804.06882 (2018)
Schuster, R., Wasenmüller, O., Unger, C., et al.: SDC—stacked dilated convolution: a unified descriptor network for dense matching tasks. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE (2020)
Wang, Z., Ji, S.: Smoothed dilated convolutions for improved dense prediction. In: arXiv: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2486–2495 (2018)
Zhang, G. L., Ge, L. L., Yang, Y. N., et al.: Fused confidence for scene text detection via intersection-over-union. In: 2019 IEEE 19th International Conference on Communication Technology (ICCT). IEEE (2019)
Santurkar, S., Tsipras, D., Ilyas, A., et al.: How does batch normalization help optimization? arXiv preprint arXiv:1805.11604 (2018)
Revaud, J., Almazan, J., Rezende, R., Souza, C. D.: Learning with average precision: training image retrieval with a listwise loss. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019)
Long, X., Hu, S., Hu, Y., et al.: An FPGA-based ultra-high-speed object detection algorithm with multi-frame information fusion. Sensors 19(17), 3707 (2019)
Acknowledgements
This research was partially supported by Scientific and Technological Research Projects in Henan Province(212102210244), Foundation of Henan Educational Committee (21A120004), Zhongyuan high level talents special support plan (ZYQR201912031), and the Fundamental Research Funds for the Universities of Henan Province (NSFRF170501).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, X., Xie, H., Zhao, Y. et al. A fast SSD model based on parameter reduction and dilated convolution. J Real-Time Image Proc 18, 2211–2224 (2021). https://doi.org/10.1007/s11554-021-01108-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11554-021-01108-9