Abstract
Logistics parcel detection technology is critical for unmanned sorting. YOLOv5, the state-of-the-art (SOTA) object detection model, is a classic network widely used in engineering. However, on the premise of fast and accurate detection of logistics parcels, it confronts certain challenges of high computing load and parameter quantity. To address these issues, this paper proposes a two-scale lightweight deep learning model named SFYOLOv5. In this article, a lightweight feature extraction module called Pruned-Shuffle-Block (PSB) is proposed. Meanwhile, a double-layer pyramid structure for the medium and large target detection is created in accordance with the image size distribution of logistics parcels. With this structure, the Floating Point of Operations (FLOPs) and parameters in feature extraction were significantly reduced. In addition, a down sampling module named Focus for Downsampling (FFD) is designed and attention modules are introduced to extract high-level semantic information in logistics parcels. These modules not only compensate for the loss caused by down sampling but also improve the mean Average Precision (mAP). Finally, the comparison experiment is performed by using the self-built logistics parcel dataset. The results show that the mAP of the model reaches 99.1%, the number of model parameters decreased by 92.14%, and the FLOPs decreased by 89.57% compared with the existing SOTA model. This model can be used in logistics parcel intelligent sorting.
Similar content being viewed by others
Data availability
The datasets in this paper involve the privacy of the company, which is inconvenient to disclose, but other data may be obtained by reasonable request from the corresponding author.
References
Chen, C.-L., Deng, Y.-Y., Weng, W., Zhou, M., Sun, H.: A blockchain-based intelligent anti-switch package in tracing logistics system. J. Supercomput. 77(7), 7791–7832 (2021). https://doi.org/10.1007/s11227-020-03558-7
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. 37(4), 805–813 (2021). https://doi.org/10.1007/s00371-020-01831-7
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y. M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934(2020)
Jung, H.-K., Choi, G.-S.: Improved yolov5: efficient object detection using drone images under various conditions. Appl. Sci. 12(14), 7255 (2022). https://doi.org/10.3390/app12147255
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y. M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696. https://doi.org/10.48550/arXiv.2207.02696(2022)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430 (2021)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289(2015)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012
Chen, Z., Wu, R., Lin, Y., Li, C., Chen, S., Yuan, Z., Chen, S., Zou, X.: Plant disease recognition model based on improved yolov5. Agronomy 12(2), 365 (2022). https://doi.org/10.3390/agronomy12020365
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Zhang, Y.-F., Ren, W., Zhang, Z., Jia, Z., Wang, L., Tan, T.: Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 506, 146–157 (2022). https://doi.org/10.1016/j.neucom.2022.07.042
Xu, X., Zhao, M., Shi, P., Ren, R., He, X., Wei, X., Yang, H.: Crack detection and comparison study based on faster R-CNN and mask R-CNN. Sensors 22(3), 1215 (2022). https://doi.org/10.3390/s22031215
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with x-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021). https://doi.org/10.1007/s00521-020-05521-2
Xue, J., Zheng, Y., Dong-Ye, C., Wang, P., Yasir, M.: Improved yolov5 network method for remote sensing image-based ground objects recognition. Soft Comput. (2022). https://doi.org/10.1007/s00500-022-07106-8
Shu, X., Yang, J., Yan, R., Song, Y.: Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans. Circuits Syst. Video Technol. (2022). https://doi.org/10.48550/arXiv.2112.10992
Xi, P., Guan, H., Shu, C., Borgeat, L., Goubran, R.: An integrated approach for medical abnormality detection using deep patch convolutional neural networks. Vis. Comput. 36(9), 1869–1882 (2020). https://doi.org/10.1007/s00371-019-01775-7
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2928540
Wang, P., Wang, M., He, D.: Multi-scale feature pyramid and multi-branch neural network for person re-identification. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02653-5
Shu, X., Zhang, L., Qi, G.-J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3300–3315 (2021). https://doi.org/10.1109/TPAMI.2021.30509182
Yao, X., Zhang, J., Chen, R., Zhang, D., Zeng, Y.: Weakly supervised graph learning for action recognition in untrimmed video. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02673-1
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180. https://doi.org/10.48550/arXiv.1805.10180 (2018)
Hu, J., Zhi, X., Shi, T., Zhang, W., Cui, Y., Zhao, S.: Pag-yolo: a portable attention-guided yolo network for small ship detection. Remote Sens. 13(16), 3059 (2021). https://doi.org/10.3390/rs13163059
Funding
This study was funded by the Natural Science Foundation of Fujian Province (2020J05236), the Fujian Science and Technology Plan STS Project (2021T3069), Xiamen Key Laboratory Of Intelligent Manufacturing Equipment and Scientific Research Start-up Project of Xiamen University of Technology (YKJ20006R).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, G., Kong, Y., Li, W. et al. Lightweight deep learning model for logistics parcel detection. Vis Comput 40, 2751–2759 (2024). https://doi.org/10.1007/s00371-023-02982-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02982-z