Skip to main content
Log in

Micro-YOLO+: Searching Optimal Methods for Compressing Object Detection Model Based on Speed, Size, Cost, and Accuracy

  • Original Research
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Convolutional neural networks play a great role in solving the problem of object detection. However, conventional object detection models, such as YOLO and SSD, are usually too large to be deployed on embedded devices due to their restricted resources and low power requirements. In this paper, several efficient methods are explored to balance model size, network accuracy, and inference speed. We explore elective lightweight convolutional layers to supplant the convolutional layers (Conv) in the YOLOv3-tiny network, including the depth-wise separable convolution (DSConv), the mobile inverted bottleneck convolution with squeeze and excitation block (MBConv) and the ghost module (GConv). Moreover, we explore the optimal hyper-parameters of the network and use the improved NMS algorithm, Cluster-NMS. Moreover, a new object detection model, Micro-YOLO+, which achieves a signification reduction in the number of parameters and computation cost while maintaining the performance is proposed. Our Micro-YOLO+ network reduces the number of parameters by 3.18\(\times\) and multiply-accumulate operation (MAC) by 2.44\(\times\) while increases the mAP evaluated on the COCO2014 dataset by 1.6%, compared to the original YOLOv3-tiny network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Borji A, Cheng M-M, Hou Q, Jiang H, Li J. Salient object detection: a survey. Comput Visual Media. 2019;5(1):117–50.

    Article  Google Scholar 

  2. Pan M, Zhu X, Li Y, Qian J, Liu P. MRNet: a keypoint guided multi-scale reasoning network for vehicle re-identification. In: Proc. neural information processing. 2020. p. 469–78.

  3. Yan W, Ji Y, Ma C, Hu L, Zhao Y, Li Y, Wang G, Lian Y. A computationally efficient, hardware re-configurable architecture for QRS detection and ECG authentication. In: IEEE Asian solid-state circuits conference (A-SSCC). 2021. p. 1–2.

  4. Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE conference on computer vision and pattern recognition. 2014. p. 580–87.

  5. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC . SSD: single shot multibox detector. In: Proc. European conference on computer vision. 2016. p. 21–37.

  6. Redmon J, Farhadi A. YOLOv3: an incremental improvement. 2018. arXiv:1804.02767.

  7. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. arXiv:1704.04861.

  8. Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al. Searching for Mobilenetv3. In: Proc. IEEE international conference on computer vision. 2019. p. 1314–24.

  9. Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: more features from cheap operations. In: Proc. IEEE conference on computer vision and pattern recognition. 2020. p. 1577–86.

  10. Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern. 2021. p. 1–13 (in press).

  11. Hu L, Li Y. Micro-YOLO: exploring efficient methods to compress CNN based object detection model. In: Proc. international conference on agents and artificial intelligence. 2021. p. 151–58.

  12. Cai H, Zhu L, Han S. ProxylessNAS: direct neural architecture search on target task and hardware. In: Proc. international conference on learning representations. 2019. p. 1222–35.

  13. Ma N, Zhang X, Zheng H-T, Sun J. Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proc. European conference on computer vision. 2018. p. 116–31.

  14. Tan M, Le Quoc V. Efficientnet: rethinking model scaling for convolutional neural networks. In: Proc. international conference on machine learning. 2019. p. 6105–14.

  15. Zhang J, Cheng L, Li C, Li Y, He G, Xu N, Lian Y. A low-latency FPGA implementation for real-time object detection. In: Proc. IEEE international symposium on circuits and systems. 2021. p. 1–5.

  16. Han S, Mao H, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proc. international conference on learning representations.2016. p. 1–14.

  17. Fernandez-Marques J, Whatmough PN, Mundy A, Mattina M. Searching for winograd-aware quantized networks. In: Proc. machine learning and systems. 2020. p. 1–16.

  18. Huang R, Pedoeem J, Chen C. YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: Proc. IEEE international conference on big data. 2018. p. 2503–10.

  19. Wong A, Famuori M, Shafiee MJ, Li F, Chwyl B, Chung J. YOLO nano: a highly compact you only look once convolutional neural network for object detection. 2019. arXiv:1910.01271.

  20. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Proc. European conference on computer vision. 2014. p. 740–55.

  21. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. Mobilenetv2: inverted residuals and linear bottlenecks. In: Proc. IEEE conference on computer vision and pattern recognition. 2018. p. 4510–20.

  22. Hosang J, Benenson R, Schiele B. Learning non-maximum suppression. In: Proc. IEEE conference on computer vision and pattern recognition. 2017. p. 4507–15.

  23. Bodla N, Singh B, Chellappa R, Davis LS. Soft-NMS—improving object detection with one line of code. In: Proc. IEEE international conference on computer vision. 2017. p. 5562–70.

  24. Jiang Y, Ma J. Combination features and models for human detection. In: Proc. IEEE conference on computer vision and pattern recognition. 2015. p. 240–48.

  25. Bolya D, Zhou C, Xiao F, Lee YJ. Yolact: real-time instance segmentation. In: Proc. IEEE international conference on computer vision. 2019. p. 9157–66.

  26. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In: NIPS autodiff workshop. 2017. p. 1–4.

Download references

Funding

This study was funded in part by the National Key Research and Development Program of China under Grant No. 2019YFB2204500 and in part by the Science, Technology and Innovation Action Plan of Shanghai Municipality, China under Grant No. 1914220370.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yongfu Li.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Agents and Artificial Intelligence” guest edited by Jaap van den Herik, Ana Paula Rocha and Luc Steels.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hu, L., Zhang, Y., Zhao, Y. et al. Micro-YOLO+: Searching Optimal Methods for Compressing Object Detection Model Based on Speed, Size, Cost, and Accuracy. SN COMPUT. SCI. 3, 391 (2022). https://doi.org/10.1007/s42979-022-01299-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-022-01299-3

Keywords

Navigation