Lightweight deep learning model for logistics parcel detection

Zhang, Guowei; Kong, Yangyang; Li, Wuzhi; Tang, Xincheng; Zhang, Weidong; Chen, Jing; Wang, Li

doi:10.1007/s00371-023-02982-z

Lightweight deep learning model for logistics parcel detection

Original article
Published: 13 July 2023

Volume 40, pages 2751–2759, (2024)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Guowei Zhang¹,
Yangyang Kong¹,
Wuzhi Li¹,
Xincheng Tang¹,
Weidong Zhang¹,
Jing Chen¹ &
…
Li Wang²

117 Accesses
1 Altmetric
Explore all metrics

Abstract

Logistics parcel detection technology is critical for unmanned sorting. YOLOv5, the state-of-the-art (SOTA) object detection model, is a classic network widely used in engineering. However, on the premise of fast and accurate detection of logistics parcels, it confronts certain challenges of high computing load and parameter quantity. To address these issues, this paper proposes a two-scale lightweight deep learning model named SFYOLOv5. In this article, a lightweight feature extraction module called Pruned-Shuffle-Block (PSB) is proposed. Meanwhile, a double-layer pyramid structure for the medium and large target detection is created in accordance with the image size distribution of logistics parcels. With this structure, the Floating Point of Operations (FLOPs) and parameters in feature extraction were significantly reduced. In addition, a down sampling module named Focus for Downsampling (FFD) is designed and attention modules are introduced to extract high-level semantic information in logistics parcels. These modules not only compensate for the loss caused by down sampling but also improve the mean Average Precision (mAP). Finally, the comparison experiment is performed by using the self-built logistics parcel dataset. The results show that the mAP of the model reaches 99.1%, the number of model parameters decreased by 92.14%, and the FLOPs decreased by 89.57% compared with the existing SOTA model. This model can be used in logistics parcel intelligent sorting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A YOLOv3-Based Learning Strategy for Vehicle-Thrown-Waste Identification

Design of Logistics Sorting Algorithm Based on Deep Learning and Sampling Evaluation

Article Open access 04 April 2024

Improved small foreign object debris detection network based on YOLOv5

Article 12 January 2024

Data availability

The datasets in this paper involve the privacy of the company, which is inconvenient to disclose, but other data may be obtained by reasonable request from the corresponding author.

References

Chen, C.-L., Deng, Y.-Y., Weng, W., Zhou, M., Sun, H.: A blockchain-based intelligent anti-switch package in tracing logistics system. J. Supercomput. 77(7), 7791–7832 (2021). https://doi.org/10.1007/s11227-020-03558-7
Article Google Scholar
Wang, Q., Wu, B., Zhu, P., Li, P., Hu, Q.: Eca-net: efficient channel attention for deep convolutional neural networks. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: Yolo-face: a real-time face detector. Vis. Comput. 37(4), 805–813 (2021). https://doi.org/10.1007/s00371-020-01831-7
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, vol. 28 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.-Y., Berg, A. C.: SSD: single shot multibox detector. In: European Conference on Computer Vision, pp. 21–37. Springer (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y. M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934(2020)
Jung, H.-K., Choi, G.-S.: Improved yolov5: efficient object detection using drone images under various conditions. Appl. Sci. 12(14), 7255 (2022). https://doi.org/10.3390/app12147255
Article Google Scholar
Wang, C.-Y., Bochkovskiy, A., Liao, H.-Y. M.: Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696. https://doi.org/10.48550/arXiv.2207.02696(2022)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6848–6856 (2018)
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: Shufflenet v2: practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 116–131 (2018)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.-C.: Mobilenetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Howard, A., Sandler, M., Chu, G., Chen, L.-C., Chen, B., Tan, M., Wang, W., Zhu, Y., Pang, R., Vasudevan, V., et al.: Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430. https://doi.org/10.48550/arXiv.2107.08430 (2021)
Clevert, D.-A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289(2015)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw. 107, 3–11 (2018). https://doi.org/10.1016/j.neunet.2017.12.012
Article Google Scholar
Chen, Z., Wu, R., Lin, Y., Li, C., Chen, S., Yuan, Z., Chen, S., Zou, X.: Plant disease recognition model based on improved yolov5. Agronomy 12(2), 365 (2022). https://doi.org/10.3390/agronomy12020365
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Zhang, Y.-F., Ren, W., Zhang, Z., Jia, Z., Wang, L., Tan, T.: Focal and efficient IOU loss for accurate bounding box regression. Neurocomputing 506, 146–157 (2022). https://doi.org/10.1016/j.neucom.2022.07.042
Article Google Scholar
Xu, X., Zhao, M., Shi, P., Ren, R., He, X., Wei, X., Yang, H.: Crack detection and comparison study based on faster R-CNN and mask R-CNN. Sensors 22(3), 1215 (2022). https://doi.org/10.3390/s22031215
Article Google Scholar
Saavedra, D., Banerjee, S., Mery, D.: Detection of threat objects in baggage inspection with x-ray images using deep learning. Neural Comput. Appl. 33(13), 7803–7819 (2021). https://doi.org/10.1007/s00521-020-05521-2
Article Google Scholar
Xue, J., Zheng, Y., Dong-Ye, C., Wang, P., Yasir, M.: Improved yolov5 network method for remote sensing image-based ground objects recognition. Soft Comput. (2022). https://doi.org/10.1007/s00500-022-07106-8
Article Google Scholar
Shu, X., Yang, J., Yan, R., Song, Y.: Expansion-squeeze-excitation fusion network for elderly activity recognition. IEEE Trans. Circuits Syst. Video Technol. (2022). https://doi.org/10.48550/arXiv.2112.10992
Article Google Scholar
Xi, P., Guan, H., Shu, C., Borgeat, L., Goubran, R.: An integrated approach for medical abnormality detection using deep patch convolutional neural networks. Vis. Comput. 36(9), 1869–1882 (2020). https://doi.org/10.1007/s00371-019-01775-7
Article Google Scholar
Tang, J., Shu, X., Yan, R., Zhang, L.: Coherence constrained graph LSTM for group activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2019). https://doi.org/10.1109/TPAMI.2019.2928540
Article Google Scholar
Wang, P., Wang, M., He, D.: Multi-scale feature pyramid and multi-branch neural network for person re-identification. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02653-5
Article Google Scholar
Shu, X., Zhang, L., Qi, G.-J., Liu, W., Tang, J.: Spatiotemporal co-attention recurrent neural networks for human-skeleton motion prediction. IEEE Trans. Pattern Anal. Mach. Intell. 44(6), 3300–3315 (2021). https://doi.org/10.1109/TPAMI.2021.30509182
Article Google Scholar
Yao, X., Zhang, J., Chen, R., Zhang, D., Zeng, Y.: Weakly supervised graph learning for action recognition in untrimmed video. Vis. Comput. (2022). https://doi.org/10.1007/s00371-022-02673-1
Article Google Scholar
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180. https://doi.org/10.48550/arXiv.1805.10180 (2018)
Hu, J., Zhi, X., Shi, T., Zhang, W., Cui, Y., Zhao, S.: Pag-yolo: a portable attention-guided yolo network for small ship detection. Remote Sens. 13(16), 3059 (2021). https://doi.org/10.3390/rs13163059
Article Google Scholar

Download references

Funding

This study was funded by the Natural Science Foundation of Fujian Province (2020J05236), the Fujian Science and Technology Plan STS Project (2021T3069), Xiamen Key Laboratory Of Intelligent Manufacturing Equipment and Scientific Research Start-up Project of Xiamen University of Technology (YKJ20006R).

Author information

Authors and Affiliations

Fujian Key Laboratory of Green Intelligent Cleaning Technology and Equipment, School of Mechanical and Automotive Engineering, Xiamen University of Technology, Xiamen, 361024, China
Guowei Zhang, Yangyang Kong, Wuzhi Li, Xincheng Tang, Weidong Zhang & Jing Chen
Research and Development Department, Shunfeng Technology Co., Ltd, Xuefu Road, Shenzhen, 518000, Guangdong Province, China
Li Wang

Authors

Guowei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yangyang Kong
View author publications
You can also search for this author in PubMed Google Scholar
Wuzhi Li
View author publications
You can also search for this author in PubMed Google Scholar
Xincheng Tang
View author publications
You can also search for this author in PubMed Google Scholar
Weidong Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Chen
View author publications
You can also search for this author in PubMed Google Scholar
Li Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Wang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, G., Kong, Y., Li, W. et al. Lightweight deep learning model for logistics parcel detection. Vis Comput 40, 2751–2759 (2024). https://doi.org/10.1007/s00371-023-02982-z

Download citation

Accepted: 22 May 2023
Published: 13 July 2023
Issue Date: April 2024
DOI: https://doi.org/10.1007/s00371-023-02982-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Lightweight deep learning model for logistics parcel detection

Abstract

Access this article

Similar content being viewed by others

A YOLOv3-Based Learning Strategy for Vehicle-Thrown-Waste Identification

Design of Logistics Sorting Algorithm Based on Deep Learning and Sampling Evaluation

Improved small foreign object debris detection network based on YOLOv5

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Lightweight deep learning model for logistics parcel detection

Abstract

Access this article

Similar content being viewed by others

A YOLOv3-Based Learning Strategy for Vehicle-Thrown-Waste Identification

Design of Logistics Sorting Algorithm Based on Deep Learning and Sampling Evaluation

Improved small foreign object debris detection network based on YOLOv5

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation