Abstract
Convolutional neural networks play a great role in solving the problem of object detection. However, conventional object detection models, such as YOLO and SSD, are usually too large to be deployed on embedded devices due to their restricted resources and low power requirements. In this paper, several efficient methods are explored to balance model size, network accuracy, and inference speed. We explore elective lightweight convolutional layers to supplant the convolutional layers (Conv) in the YOLOv3-tiny network, including the depth-wise separable convolution (DSConv), the mobile inverted bottleneck convolution with squeeze and excitation block (MBConv) and the ghost module (GConv). Moreover, we explore the optimal hyper-parameters of the network and use the improved NMS algorithm, Cluster-NMS. Moreover, a new object detection model, Micro-YOLO+, which achieves a signification reduction in the number of parameters and computation cost while maintaining the performance is proposed. Our Micro-YOLO+ network reduces the number of parameters by 3.18\(\times\) and multiply-accumulate operation (MAC) by 2.44\(\times\) while increases the mAP evaluated on the COCO2014 dataset by 1.6%, compared to the original YOLOv3-tiny network.




Similar content being viewed by others
Explore related subjects
Discover the latest articles and news from researchers in related subjects, suggested using machine learning.References
Borji A, Cheng M-M, Hou Q, Jiang H, Li J. Salient object detection: a survey. Comput Visual Media. 2019;5(1):117–50.
Pan M, Zhu X, Li Y, Qian J, Liu P. MRNet: a keypoint guided multi-scale reasoning network for vehicle re-identification. In: Proc. neural information processing. 2020. p. 469–78.
Yan W, Ji Y, Ma C, Hu L, Zhao Y, Li Y, Wang G, Lian Y. A computationally efficient, hardware re-configurable architecture for QRS detection and ECG authentication. In: IEEE Asian solid-state circuits conference (A-SSCC). 2021. p. 1–2.
Girshick R, Donahue J, Darrell T, Malik J. Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proc. IEEE conference on computer vision and pattern recognition. 2014. p. 580–87.
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC . SSD: single shot multibox detector. In: Proc. European conference on computer vision. 2016. p. 21–37.
Redmon J, Farhadi A. YOLOv3: an incremental improvement. 2018. arXiv:1804.02767.
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: efficient convolutional neural networks for mobile vision applications. 2017. arXiv:1704.04861.
Howard A, Sandler M, Chu G, Chen L-C, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, et al. Searching for Mobilenetv3. In: Proc. IEEE international conference on computer vision. 2019. p. 1314–24.
Han K, Wang Y, Tian Q, Guo J, Xu C, Xu C. GhostNet: more features from cheap operations. In: Proc. IEEE conference on computer vision and pattern recognition. 2020. p. 1577–86.
Zheng Z, Wang P, Ren D, Liu W, Ye R, Hu Q, Zuo W. Enhancing geometric factors in model learning and inference for object detection and instance segmentation. IEEE Trans Cybern. 2021. p. 1–13 (in press).
Hu L, Li Y. Micro-YOLO: exploring efficient methods to compress CNN based object detection model. In: Proc. international conference on agents and artificial intelligence. 2021. p. 151–58.
Cai H, Zhu L, Han S. ProxylessNAS: direct neural architecture search on target task and hardware. In: Proc. international conference on learning representations. 2019. p. 1222–35.
Ma N, Zhang X, Zheng H-T, Sun J. Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proc. European conference on computer vision. 2018. p. 116–31.
Tan M, Le Quoc V. Efficientnet: rethinking model scaling for convolutional neural networks. In: Proc. international conference on machine learning. 2019. p. 6105–14.
Zhang J, Cheng L, Li C, Li Y, He G, Xu N, Lian Y. A low-latency FPGA implementation for real-time object detection. In: Proc. IEEE international symposium on circuits and systems. 2021. p. 1–5.
Han S, Mao H, Dally WJ. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. In: Proc. international conference on learning representations.2016. p. 1–14.
Fernandez-Marques J, Whatmough PN, Mundy A, Mattina M. Searching for winograd-aware quantized networks. In: Proc. machine learning and systems. 2020. p. 1–16.
Huang R, Pedoeem J, Chen C. YOLO-LITE: a real-time object detection algorithm optimized for non-GPU computers. In: Proc. IEEE international conference on big data. 2018. p. 2503–10.
Wong A, Famuori M, Shafiee MJ, Li F, Chwyl B, Chung J. YOLO nano: a highly compact you only look once convolutional neural network for object detection. 2019. arXiv:1910.01271.
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Proc. European conference on computer vision. 2014. p. 740–55.
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C. Mobilenetv2: inverted residuals and linear bottlenecks. In: Proc. IEEE conference on computer vision and pattern recognition. 2018. p. 4510–20.
Hosang J, Benenson R, Schiele B. Learning non-maximum suppression. In: Proc. IEEE conference on computer vision and pattern recognition. 2017. p. 4507–15.
Bodla N, Singh B, Chellappa R, Davis LS. Soft-NMS—improving object detection with one line of code. In: Proc. IEEE international conference on computer vision. 2017. p. 5562–70.
Jiang Y, Ma J. Combination features and models for human detection. In: Proc. IEEE conference on computer vision and pattern recognition. 2015. p. 240–48.
Bolya D, Zhou C, Xiao F, Lee YJ. Yolact: real-time instance segmentation. In: Proc. IEEE international conference on computer vision. 2019. p. 9157–66.
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A. Automatic differentiation in PyTorch. In: NIPS autodiff workshop. 2017. p. 1–4.
Funding
This study was funded in part by the National Key Research and Development Program of China under Grant No. 2019YFB2204500 and in part by the Science, Technology and Innovation Action Plan of Shanghai Municipality, China under Grant No. 1914220370.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of interest
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Agents and Artificial Intelligence” guest edited by Jaap van den Herik, Ana Paula Rocha and Luc Steels.
Rights and permissions
About this article
Cite this article
Hu, L., Zhang, Y., Zhao, Y. et al. Micro-YOLO+: Searching Optimal Methods for Compressing Object Detection Model Based on Speed, Size, Cost, and Accuracy. SN COMPUT. SCI. 3, 391 (2022). https://doi.org/10.1007/s42979-022-01299-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01299-3