A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism

Mu, Jianhong; Su, Qinghua; Wang, Xiyu; Liang, Wenhui; Xu, Sheng; Wan, Kaizheng

doi:10.1007/s11554-024-01562-1

A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism

Research
Published: 10 October 2024

Volume 21, article number 184, (2024)
Cite this article

Journal of Real-Time Image Processing Aims and scope Submit manuscript

Jianhong Mu¹,
Qinghua Su¹,
Xiyu Wang¹,
Wenhui Liang¹,
Sheng Xu¹ &
…
Kaizheng Wan¹

435 Accesses
1 Citation
Explore all metrics

Abstract

A novel detection method is proposed to address the challenge of detecting small objects in object detection. This method augments the YOLOv8n architecture with a small object detection layer and innovatively designs a Concat-detection head to effectively extract features. Simultaneously, a new attention mechanism—Multi-Head Mixed Self-Attention (MMSA) mechanism—is introduced to enhance the feature-extraction capability of the backbone. To improve the detection sensitivity for small objects, a combination of Normalized Wasserstein Distance (NWD) and Intersection over Union (IoU) is used to calculate the localization loss, optimizing the bounding-box regression. Experimental results on the TT100K dataset show that the mean average precision (mAP@0.5) reaches 88.1%, which is a 13.5% improvement over YOLOv8n. The method’s versatility is also validated through experiments on the BDD100K dataset, where it is compared with various object-detection algorithms. The results demonstrate that this method yields significant improvements and practical value in the field of small-object detection. Detailed code can be found at https://github.com/CodeSworder/MMSA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Transforming Limitations into Advantages: Improving Small Object Detection Accuracy with SC-AttentionIoU Loss Function

Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

Article 29 July 2024

An attention-based feature pyramid network for single-stage small object detection

Article 18 November 2022

Data availability

The datasets utilized in the article are all publicly available. The detailed URLs for the datasets can be found in. https://cg.cs.tsinghua.edu.cn/traffic-sign/.

References

Li, Z., Yang, L., Zhou, F.: FSSD: feature fusion single shot multibox detector. arXiv arXiv:1712.00960 (2024)
Liu, Z., Gao, G., Sun, L., Fang, Z.: HRDNet: high-resolution detection network for small objects. In: 2021 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6. IEEE (2021). https://ieeexplore.ieee.org/abstract/document/9428241/. Accessed 22 May 2024
Deng, C., Wang, M., Liu, L., Liu, Y., Jiang, Y.: Extended feature pyramid network for small object detection. IEEE Trans. Multimedia 24, 1968–1979 (2021). https://doi.org/10.1109/TMM.2021.3074273
Article Google Scholar
Zhang, Z.: Drone-YOLO: an efficient neural network method for target detection in drone images. Drones 7(8), 526 (2023). https://doi.org/10.3390/drones7080526
Article Google Scholar
Han, K., et al.: A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 45(1), 87–110 (2022). https://doi.org/10.1109/TPAMI.2022.3152247
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30 (2017)
Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021). https://doi.org/10.1016/j.neucom.2021.03.091
Article Google Scholar
Guo, M.-H., et al.: Attention mechanisms in computer vision: a survey. Comput. Vis. Media 8(3), 331–368 (2022). https://doi.org/10.1007/s41095-022-0271-y
Article Google Scholar
Posner, M.I., Boies, S.J.: Components of attention. Psychol. Rev. 78(5), 391 (1971). https://doi.org/10.1037/h0031333
Article Google Scholar
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7132–7141 (2018)
Wang, Q., Wu, B., Zhu, P., Li, P., Zuo, W., Hu, Q.: ECA-Net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11534–11542 (2020)
Hou, Q., Zhou, D., Feng, J:. Coordinate attention for efficient mobile network design. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13713–13722 (2021)
Wan, D., Lu, R., Shen, S., Xu, T., Lang, X., Ren, Z.: Mixed local channel attention for object detection. Eng. Appl. Artif. Intell. 123, 106442 (2023). https://doi.org/10.1016/j.engappai.2023.106442
Article Google Scholar
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 840–849 (2019)
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-IoU loss: faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12993–13000 (2020)
Li, X., et al.: Generalized focal loss: learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)
Google Scholar
Wang, J., Chen, Y., Dong, Z., Gao, M.: Improved YOLOv5 network for real-time multi-scale traffic sign detection. Neural Comput. Appl. 35(10), 7853–7865 (2023). https://doi.org/10.1007/s00521-022-08077-5
Article Google Scholar
Wang, M., et al.: FE-YOLOv5: feature enhancement network based on YOLOv5 for small object detection. J. Vis. Commun. Image Represent. 90, 103752 (2023)
Article Google Scholar
Zeng, S., Yang, W., Jiao, Y., Geng, L., Chen, X.: SCA-YOLO: a new small object detection model for UAV images. Vis. Comput. 40(3), 1787–1803 (2024). https://doi.org/10.1007/s00371-023-02886-y
Article Google Scholar
Zhang, Y., Ye, M., Zhu, G., Liu, Y., Guo, P., Yan, J.: FFCA-YOLO for small object detection in remote sensing images. IEEE Trans. Geosci. Remote Sens. (2024). https://ieeexplore.ieee.org/abstract/document/10423050/. Accessed 05 Aug 2024.
Wang, H., Liu, C., Cai, Y., Chen, L., Li, Y.: YOLOv8-QSD: an improved small object detection algorithm for autonomous vehicles based on YOLOv8. IEEE Trans. Instrum. Meas. (2024). https://ieeexplore.ieee.org/abstract/document/10474434/. Accessed 05 Aug 2024
Zhang, Y., Zhang, H., Huang, Q., Han, Y., Zhao, M.: DsP-YOLO: an anchor-free network with DsPAN for small object detection of multiscale defects. Expert Syst. Appl. 241, 122669 (2024). https://doi.org/10.1016/j.eswa.2023.122669
Article Google Scholar
Jing, R., Zhang, W., Liu, Y., Li, W., Li, Y., Liu, C.: An effective method for small object detection in low-resolution images. Eng. Appl. Artif. Intell. 127, 107206 (2024). https://doi.org/10.1016/j.engappai.2023.107206
Article Google Scholar
Wang, J., Xu, C., Yang, W., Yu, L.: A normalized Gaussian Wasserstein distance for tiny object detection. arXiv: arXiv:2110.13389 (2022). Accessed 19 Jun 2024
Zhu, Z., Liang, D., Zhang, S., Huang, X., Li, B., Hu, S.: Traffic-sign detection and classification in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2110–2118 (2016). http://openaccess.thecvf.com/content_cvpr_2016/html/Zhu_Traffic-Sign_Detection_and_CVPR_2016_paper.html. Accessed 22 May 2024
Yu, F., et al.: Bdd100k: a diverse driving dataset for heterogeneous multitask learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2636–2645 (2020). http://openaccess.thecvf.com/content_CVPR_2020/html/Yu_BDD100K_A_Diverse_Driving_Dataset_for_Heterogeneous_Multitask_Learning_CVPR_2020_paper.html. Accessed 22 May 2024
Chen, X., et al.: HAT: hybrid attention transformer for image restoration. arXiv: arXiv:2309.05239 (2024). Accessed 05 Jun 2024
Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 839–847. IEEE (2018)
Wang, C., et al.: Gold-YOLO: efficient object detector via gather-and-distribute mechanism. Adv. Neural Inf. Process. Syst. 36 (2024)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019). http://openaccess.thecvf.com/content_CVPR_2019/html/Pang_Libra_R-CNN_Towards_Balanced_Learning_for_Object_Detection_CVPR_2019_paper.html. Accessed 22 May 2024.
Paz, D., Zhang, H., Christensen, H.I.: TridentNet: a conditional generative model for dynamic trajectory generation. In: Ang, M.H., Jr., Asama, H., Lin, W., Foong, S. (eds.) Intelligent autonomous systems 16. Lecture notes in networks and systems, vol. 412, pp. 403–416. Springer International Publishing, Cham (2022). https://doi.org/10.1007/978-3-030-95892-3_31
Chapter Google Scholar
Wang, A., et al.: YOLOv10: real-time end-to-end object detection. arXiv:2405.14458 (2024). https://doi.org/10.48550/arXiv.2405.14458.

Download references

Author information

Authors and Affiliations

School of Information, Beijing Wuzi University, Beijing, 101149, China
Jianhong Mu, Qinghua Su, Xiyu Wang, Wenhui Liang, Sheng Xu & Kaizheng Wan

Authors

Jianhong Mu
View author publications
You can also search for this author in PubMed Google Scholar
Qinghua Su
View author publications
You can also search for this author in PubMed Google Scholar
Xiyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Wenhui Liang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kaizheng Wan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Jianhong Mu—original draft; Qinghua Su—review & editing; Xiyu Wang, Wenhui Liang, Sheng Xu, and Kaizheng Wan assisted with some comparative experiments during the revision process. All authors reviewed the manuscript.

Corresponding authors

Correspondence to Jianhong Mu or Qinghua Su.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mu, J., Su, Q., Wang, X. et al. A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism. J Real-Time Image Proc 21, 184 (2024). https://doi.org/10.1007/s11554-024-01562-1

Download citation

Received: 25 June 2024
Accepted: 30 September 2024
Published: 10 October 2024
DOI: https://doi.org/10.1007/s11554-024-01562-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transforming Limitations into Advantages: Improving Small Object Detection Accuracy with SC-AttentionIoU Loss Function

Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

An attention-based feature pyramid network for single-stage small object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Transforming Limitations into Advantages: Improving Small Object Detection Accuracy with SC-AttentionIoU Loss Function

Yolo-tla: An Efficient and Lightweight Small Object Detection Model based on YOLOv5

An attention-based feature pyramid network for single-stage small object detection

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation