Abstract
Although existing object detection algorithms have achieved excellent detection accuracy, with the continuous improvement of detection accuracy, the parameters of the model are getting larger and larger, and the model complexity is getting higher and higher, which makes it difficult to deploy the object detection algorithms on the edge end and mobile end. In order to improve the application of the object detection algorithm on edge and mobile, this paper proposes a lightweight object detection algorithm, CAL-SSD, using a coordinated attention mechanism. First, we embed the coordinated attention mechanism into MobileNetv2 to form CA_MobileNetv2 as the backbone of the CAL-SSD object detection algorithm, significantly reducing the model parameters and complexity and improving the network’s ability to differentiate between object and background. Second, we design a super-resolution feature fusion module (SFFM) to introduce deep semantic information into shallow feature maps. Then, we use depthwise separable convolution instead of traditional 3×3 convolution to construct additional feature layers and detection heads, further reducing the model parameters. Finally, we employ BiFPN to construct a new feature pyramid to utilize the multi-scale features of the target fully. Experimental results on the PASCAL VOC and MS COCO datasets show that CAL-SSD significantly reduces the model parameters and complexity and achieves an optimal balance of speed and accuracy.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03716-x/MediaObjects/11760_2024_3716_Fig11_HTML.jpg)
Similar content being viewed by others
References
Cao, J., Bao, W., Shang, H., Yuan, M., Cheng, Q.: Gcl-yolo: a ghostconv-based lightweight yolo network for uav small object detection. Remote Sensing 15(20), 4932 (2023)
Cao, Y., Li, C., Peng, Y., Ru, H.: Mcs-yolo: a multiscale object detection method for autonomous driving road environment recognition. IEEE Access 11, 22342–22354 (2023)
Ding, P., Qian, H., Bao, J., Zhou, Y., Yan, S.: L-yolov4: lightweight yolov4 based on modified rfb-s and depthwise separable convolution for multi-target detection in complex scenes. J. Real-Time Image Proc. 20(4), 71 (2023)
Ding, P., Qian, H., Chu, S.: Slimyolov4: lightweight object detector based on yolov4. J. Real-Time Image Proc. 19(3), 487–498 (2022)
Girshick, R.: Fast r-cnn. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Han, J., Yang, Y.: L-net: lightweight and fast object detector-based shufflenetv2. J. Real-Time Image Proc. 18(6), 2527–2538 (2021)
Han, K., Wang, Y., Tian, Q., Guo, J., Xu, C., Xu, C.: Ghostnet: More features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
He, J., Chen, Y., Wang, N., Zhang, Z.: 3d video object detection with learnable object-centric global optimization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5106–5115 (2023)
Hou, Q., Zhou, D., Feng, J.: Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456. PMLR (2015)
Jiang, Z., Zhao, L., Li, S., Jia, Y.: Real-time object detection method based on improved yolov4-tiny. arXiv preprint arXiv:2011.04244 (2020)
Kaur, J., Singh, W.: A systematic review of object detection from images using deep learning. Multimedia Tools Appl. 83(4), 12253–12338 (2024)
Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: Object localization by efficient subwindow search. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008)
Li, L., Li, B., Zhou, H.: Lightweight multi-scale network for small object detection. PeerJ Computer Sci. 8, e1145 (2022)
Li, Z., Peng, C., Yu, G., Zhang, X., Deng, Y., Sun, J.: Detnet: A backbone network for object detection. arXiv preprint arXiv:1804.06215 (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8759–8768 (2018)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: European conference on computer vision, pp. 21–37. Springer (2016)
Qian, H., Wang, H.: Lightweight object detection based on super-resolution. In: 2022 China Automation Congress (CAC), pp. 2493–2498. IEEE (2022)
Qian, H., Wang, H., Feng, S., Yan, S.: Fessd: Ssd target detection based on feature fusion and feature enhancement. J. Real-Time Image Proc. 20(1), 2 (2023)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788 (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4510–4520 (2018)
Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z.: Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1874–1883 (2016)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10781–10790 (2020)
Wang, H., Qian, H., Feng, S., Wang, W.: L-ssd: lightweight ssd target detection based on depth-separable convolution. J. Real-Time Image Proc. 21(2), 1–15 (2024)
Wen, L., Cheng, Y., Fang, Y., Li, X.: A comprehensive survey of oriented object detection in remote sensing images. Expert Syst. Appl. 224, 119960 (2023)
Woo, S., Park, J., Lee, J.Y., Kweon, I.S.: Cbam: Convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp. 3–19 (2018)
Wu, B., Iandola, F., Jin, P.H., Keutzer, K.: Squeezedet: Unified, small, low power fully convolutional neural networks for real-time object detection for autonomous driving. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp. 129–137 (2017)
Yang, J., Jiang, J.: Dilated-cbam: An efficient attention network with dilated convolution. In: 2021 IEEE International Conference on Unmanned Systems (ICUS), pp. 11–15. IEEE (2021)
Zeng, N., Wu, P., Wang, Z., Li, H., Liu, W., Liu, X.: A small-sized object detection oriented multi-scale feature fusion approach with application to defect detection. IEEE Trans. Instrum. Meas. 71, 1–14 (2022)
Zhang, Y., Bi, S., Dong, M., Liu, Y.: The implementation of cnn-based object detector on arm embedded platforms. In: 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), pp. 379–382. IEEE (2018)
Zhong, X., Wang, M., Liu, W., Yuan, J., Huang, W.: Scpnet: Self-constrained parallelism network for keypoint-based lightweight object detection. J. Vis. Commun. Image Represent. 90, 103719 (2023)
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest STATEMENT
The authors declare that they have no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhong, X. CAL-SSD: lightweight SSD object detection based on coordinated attention. SIViP 19, 31 (2025). https://doi.org/10.1007/s11760-024-03716-x
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03716-x