Abstract
Accurate real-time object detection plays a key role in various practical scenarios such as automatic driving and UAV surveillance. The memory limitation and poor computing power of edge devices hinder the deployment of high performance Convolutional Neural Networks (CNNs). Iterative channel pruning is an effective method to obtain lightweight networks. However, the channel importance measurement and iterative pruning in the existing methods are suboptimal. In this paper, to measure the channel importance, we simultaneously consider the scale factor of batch normalization (BN) and the kernel weight of convolutional layers. Besides, sparsity training and fine tuning are combined to simplify the pruning pipeline. Notably, the cosine decay of sparsity coefficient and soft mask strategy are used to optimize our compact model, i.e., Pruned-YOLOv3/v5, which is constructed via pruning YOLOv3/v5. The experimental results on the MS-COCO and VisDrone datasets show that the proposed model achieves a satisfactory balance between computational efficiency and detection accuracy.
This work is supported by Chinese National Natural Science Foundation (62076033, U1931202), and BUPT innovation and entrepreneurship support program (2021-YC-T026).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Azevedo, T., de Jong, R., Maji, P.: Stochastic-yolo: Efficient probabilistic object detection under dataset shifts. arXiv preprint arXiv:2009.02967 (2020)
Blakeney, C., Yan, Y., Zong, Z.: Is pruning compression? investigating pruning via network layer similarity. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 914–922 (2020)
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Carion, N., et al.: End-to-end object detection with transformers. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 213–229. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_13
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 742–751 (2017)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Choi, J., Chun, D., Kim, H., Lee, H.J.: Gaussian yolov3: An accurate and fast object detector using localization uncertainty for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 502–511 (2019)
Ding, X., Ding, G., Guo, Y., Han, J.: Centripetal sgd for pruning very deep convolutional networks with complicated structure. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4943–4953 (2019)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. Int. J. Comput. Vision 88(2), 303–338 (2010)
Guo, J., Ouyang, W., Xu, D.: Multi-dimensional pruning: a unified framework for model compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1508–1517 (2020)
Han, S., Pool, J., Tran, J., Dally, W.J.: Learning both weights and connections for efficient neural networks. arXiv preprint arXiv:1506.02626 (2015)
Hassibi, B., Stork, D.G.: Second order derivatives for network pruning: Optimal brain surgeon. Morgan Kaufmann (1993)
He, Y., Liu, P., Wang, Z., Hu, Z., Yang, Y.: Filter pruning via geometric median for deep convolutional neural networks acceleration. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4340–4349 (2019)
Huang, Z., Wang, J., Fu, X., Yu, T., Guo, Y., Wang, R.: Dc-spp-yolo: dense connection and spatial pyramid pooling based yolo for object detection. Inf. Sci. 522, 241–258 (2020)
Jocher, G.: ultralytics/yolov5. https://github.com/ultralytics/yolov5
Kwon, S.J., Lee, D., Kim, B., Kapoor, P., Park, B., Wei, G.Y.: Structured compression by weight encryption for unstructured pruning and quantization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1909–1918 (2020)
Li, Y., Gu, S., Mayer, C., Gool, L.V., Timofte, R.: Group sparsity: the hinge between filter pruning and decomposition for network compression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8018–8027 (2020)
Lin, M., et al.: Hrank: filter pruning using high-rank feature map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1529–1538 (2020)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Liu, S., Huang, D., Wang, Y.: Learning spatial fusion for single-shot object detection. arXiv preprint arXiv:1911.09516 (2019)
Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2604–2613 (2019)
Liu, Z., Li, J., Shen, Z., Huang, G., Yan, S., Zhang, C.: Learning efficient convolutional networks through network slimming. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2736–2744 (2017)
Long, X., et al.: Pp-yolo: an effective and efficient implementation of object detector. arXiv preprint arXiv:2007.12099 (2020)
Qin, Z., Zhang, P., Wu, F., Li, X.: Fcanet: frequency channel attention networks. arXiv preprint arXiv:2012.11879 (2020)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Redmon, J., Farhadi, A.: Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Tung, F., Mori, G.: Clip-q: Deep network compression learning by in-parallel pruning-quantization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7873–7882 (2018)
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-yolov4: Scaling cross stage partial network. arXiv preprint arXiv:2011.08036 (2020)
Wang, Z., Zhang, J., Zhao, Z., Su, F.: Efficient yolo: A lightweight model for embedded deep learning object detection. In: 2020 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). pp. 1–6. IEEE (2020)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Ye, J., Lu, X., Lin, Z., Wang, J.Z.: Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. arXiv preprint arXiv:1802.00124 (2018)
Zhang, J., Zhao, Z., Su, F.: Efficient-receptive field block with group spatial attention mechanism for object detection. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 3248–3255 (2021)
Zhang, P., Zhong, Y., Li, X.: Slimyolov3: narrower, faster and better for real-time uav applications. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
Zhi Tian, Chunhua Shen, H.C., He, T.: FCOS: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
Zhu, P., Wen, L., Bian, X., Ling, H., Hu, Q.: Vision meets drones: A challenge. arXiv preprint arXiv:1804.07437 (2018)
Zhu, X., Su, W., Lu, L., Li, B., Wang, X., Dai, J.: Deformable detr: deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, J., Wang, P., Zhao, Z., Su, F. (2021). Pruned-YOLO: Learning Efficient Object Detector Using Model Pruning. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12894. Springer, Cham. https://doi.org/10.1007/978-3-030-86380-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-86380-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86379-1
Online ISBN: 978-3-030-86380-7
eBook Packages: Computer ScienceComputer Science (R0)