Skip to main content
Log in

CAL-SSD: lightweight SSD object detection based on coordinated attention

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Although existing object detection algorithms have achieved excellent detection accuracy, with the continuous improvement of detection accuracy, the parameters of the model are getting larger and larger, and the model complexity is getting higher and higher, which makes it difficult to deploy the object detection algorithms on the edge end and mobile end. In order to improve the application of the object detection algorithm on edge and mobile, this paper proposes a lightweight object detection algorithm, CAL-SSD, using a coordinated attention mechanism. First, we embed the coordinated attention mechanism into MobileNetv2 to form CA_MobileNetv2 as the backbone of the CAL-SSD object detection algorithm, significantly reducing the model parameters and complexity and improving the network’s ability to differentiate between object and background. Second, we design a super-resolution feature fusion module (SFFM) to introduce deep semantic information into shallow feature maps. Then, we use depthwise separable convolution instead of traditional 3×3 convolution to construct additional feature layers and detection heads, further reducing the model parameters. Finally, we employ BiFPN to construct a new feature pyramid to utilize the multi-scale features of the target fully. Experimental results on the PASCAL VOC and MS COCO datasets show that CAL-SSD significantly reduces the model parameters and complexity and achieves an optimal balance of speed and accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Cao, J., Bao, W., Shang, H., Yuan, M., Cheng, Q.: Gcl-yolo: a ghostconv-based lightweight yolo network for uav small object detection. Remote Sensing 15(20), 4932 (2023)

    Article  MATH  Google Scholar 

  2. Cao, Y., Li, C., Peng, Y., Ru, H.: Mcs-yolo: a multiscale object detection method for autonomous driving road environment recognition. IEEE Access 11, 22342–22354 (2023)

    Article  Google Scholar 

  3. Ding, P., Qian, H., Bao, J., Zhou, Y., Yan, S.: L-yolov4: lightweight yolov4 based on modified rfb-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Proc. 20(4), 71 (2023)

    Article  Google Scholar 

  4. Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Proc. 19(3), 487–498 (2022)

    Article  MATH  Google Scholar 

  5. Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  6. Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real-Time Image Proc. 18(6), 2527–2538 (2021)

    Article  MATH  Google Scholar 

  7. Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)

  8. He, J., Chen, Y., Wang, N., Zhang, Z.: 3d video object detection with learnable object-centric global optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5106–5115 (2023)

  9. Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)

  10. Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)

  11. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)

  12. Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)

  13. Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved yolov4-tiny. arXiv preprint arXiv:2011.04244 (2020)

  14. Kaur, J., Singh, W.: A systematic review of object detection from images using deep learning. Multimedia Tools Appl. 83(4), 12253–12338 (2024)

    Article  MATH  Google Scholar 

  15. Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: Object localization by efficient subwindow search. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)

  16. Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Computer Sci. 8, e1145 (2022)

    Article  MATH  Google Scholar 

  17. Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215 (2018)

  18. Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  19. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)

  20. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016)

  21. Qian, H., Wang, H.: Lightweight object detection based on super-resolution. In: 2022 China Automation Congress (CAC), pp. 2493–2498. IEEE (2022)

  22. Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)

    Article  MATH  Google Scholar 

  23. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)

  24. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018)

  25. Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)

  26. Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790 (2020)

  27. Wang, H., Qian, H., Feng, S., Wang, W.: L-ssd: lightweight ssd target detection based on depth-separable convolution. J. Real-Time Image Proc. 21(2), 1–15 (2024)

    Article  MATH  Google Scholar 

  28. Wen, L., Cheng, Y., Fang, Y., Li, X.: A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 224, 119960 (2023)

    Article  MATH  Google Scholar 

  29. Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)

  30. Wu, B., Iandola, F., Jin, P.H., Keutzer, K.: Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 129–137 (2017)

  31. Yang, J., Jiang, J.: Dilated-cbam: An efficient attention network with dilated convolution. In: 2021 IEEE International Conference on Unmanned Systems (ICUS), pp. 11–15. IEEE (2021)

  32. Zeng, N., Wu, P., Wang, Z., Li, H., Liu, W., Liu, X.: A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)

    MATH  Google Scholar 

  33. Zhang, Y., Bi, S., Dong, M., Liu, Y.: The implementation of cnn-based object detector on arm embedded platforms. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 379–382. IEEE (2018)

  34. Zhong, X., Wang, M., Liu, W., Yuan, J., Huang, W.: Scpnet: Self-constrained parallelism network for keypoint-based lightweight object detection. J. Vis. Commun. Image Represent. 90, 103719 (2023)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xin Zhong.

Ethics declarations

Conflict of interest STATEMENT

The authors declare that they have no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhong, X. CAL-SSD: lightweight SSD object detection based on coordinated attention. SIViP 19, 31 (2025). https://doi.org/10.1007/s11760-024-03716-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11760-024-03716-x

Keywords

Navigation