Towards more accurate object detection via encoding reinforcement and multi-channel enhancement

Wang, Weina; Li, Shuangyong; Jumahong, Huxidan

doi:10.1007/s10489-024-06200-8

Towards more accurate object detection via encoding reinforcement and multi-channel enhancement

Published: 23 December 2024

Volume 55, article number 212, (2025)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

91 Accesses
Explore all metrics

Abstract

The existing object detection networks typically apply small kernel convolution that can extract sufficient features for recognizing targets but have poor long-range dependency capability and smaller receptive fields. This paper proposes an object detection network with structure featuring large kernel convolutions and multiple channels. Firstly, the encoding reinforcement module using large kernel convolutions is designed to enlarge the receptive field and improve global feature extraction. Then, the channel enhancement module is constructed to enhance structural information learning. In addition, the encoding reinforcement and channel enhancement are designed in a lightweight way. Finally, the WIOU loss function is introduced to enhance the model’s robustness in poor-quality datasets. In the experiments, the proposed model can achieve optimal performance with similar parameters or computational complexity to existing CNN-based lightweight models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CR-FPN: channel relation feature pyramid network for object detection

Article 22 June 2020

Object detection network pruning with multi-task information fusion

Article 18 February 2022

Object Detection by Combining Deep Dilated Convolutions Network and Light-Weight Network

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets support this research are available in Pascal VOC and MS COCO.

References

Zhang Y, Wang T, Zhang X (2023) Motrv2: Bootstrapping end-to-end multi-object tracking by pretrained object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 22056–22065
Fu C, Lu K, Zheng G, Ye J, Cao Z, Li B, Lu G (2023) Siamese object tracking for unmanned aerial vehicle: A review and comprehensive analysis. Artif Intell Rev 56(Suppl 1):1417–1477
Article MATH Google Scholar
Ullah H, Munir A (2023) Human activity recognition using cascaded dual attention cnn and bi-directional gru framework. J Imag 9(7):130
Article MATH Google Scholar
Song Y-F, Zhang Z, Shan C, Wang L (2022) Constructing stronger and faster baselines for skeleton-based action recognition. IEEE Trans Patt Anal Mach Intell 45(2):1474–1488
Article MATH Google Scholar
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 580–587
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European Conference on Computer Vision, pp 213–229. Springer
Ding X, Zhang X, Han J, Ding G (2022) Scaling up your kernels to 31x31: Revisiting large kernel design in cnns. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11963–11975
Liu S, Chen T, Chen X, Chen X, Xiao Q, Wu B, Kärkkäinen T, Pechenizkiy M, Mocanu D, Wang Z (2022) More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Patt Anal Mach Intell 37(9):1904–1916
Article MATH Google Scholar
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 2980–2988
Bakkouri I, Bakkouri S (2024) 2mgas-net: multi-level multi-scale gated attentional squeezed network for polyp segmentation. Signal, Image and Video Processing 1–10
Bakkouri I, Afdel K (2020) Dermonet: A computer-aided diagnosis system for dermoscopic disease recognition. In: Image and Signal Processing: 9th International Conference, ICISP 2020, Marrakesh, Morocco, June 4–6, 2020, Proceedings 9, pp 170–177. Springer
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Luo W, Li Y, Urtasun R, Zemel R (2016) Understanding the effective receptive field in deep convolutional neural networks. Adv Neural Inf Process Syst 29
Chen Q, Wang Y, Yang T, Zhang X, Cheng J, Sun J (2021) You only look one-level feature. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 13039–13048
Zagoruyko S, Komodakis N (2016) Wide residual networks. arXiv preprint arXiv:1605.07146
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 658–666
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol 34, pp 12993–13000
Gevorgyan Z (2022) Siou loss: More powerful learning for bounding box regression. arXiv preprint arXiv:2205.12740
Tong Z, Chen Y, Xu Z, Yu R (2023) Wise-iou: Bounding box regression loss with dynamic focusing mechanism. arXiv preprint arXiv:2301.10051
Everingham M, Eslami SA, Van Gool L, Williams CK, Winn J, Zisserman A (2015) The pascal visual object classes challenge: A retrospective. Int J Comp Vision 111:98–136
Article MATH Google Scholar
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp 740–755. Springer
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: Single shot multibox detector. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp 21–37. Springer
Ma N, Zhang X, Zheng H-T, Sun J (2018) Shufflenet v2: Practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 116–131
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4203–4212
Termritthikun C, Jamtsho Y, Ieamsaard J, Muneesawang P, Lee I (2021) Eeea-net: An early exit evolutionary neural architecture search. Eng Appl Artif Intell 104:104397
Article Google Scholar
Zhu X, Lyu S, Wang X, Zhao Q (2021) Tph-yolov5: Improved yolov5 based on transformer prediction head for object detection on drone-captured scenarios. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 2778–2788
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W et al (2022) Yolov6: A single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7464–7475
Ye X, Chen S, Xu R (2021) Dpnet: Detail-preserving network for high quality monocular depth estimation. Patt Recogn 109:107578
Article MATH Google Scholar
Wang W, Li S, Shao J, Jumahong H (2023) Lkc-net: large kernel convolution object detection network. Sci Rep 13(1):9535
Article Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 4510–4520
Yu G, Chang Q, Lv W, Xu C, Cui C, Ji W, Dang Q, Deng K, Wang G, Du Y et al (2021) Pp-picodet: A better real-time object detector on mobile devices. arXiv preprint arXiv:2111.00902
Tan M, Pang R, Le QV (2020) Efficientdet: Scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10781–10790
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: Optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934
Huang X, Wang X, Lv W, Bai X, Long X, Deng K, Dang Q, Han S, Liu Q, Hu X et al (2021) Pp-yolov2: A practical object detector. arXiv preprint arXiv:2104.10419
Wang C-Y, Bochkovskiy A, Liao H-YM (2021) Scaled-yolov4: Scaling cross stage partial network. In: Proceedings of the IEEE/cvf Conference on Computer Vision and Pattern Recognition, pp 13029–13038
Ge Z, Liu S, Wang F, Li Z, Sun J (2021) Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430
Xu X, Jiang Y, Chen W, Huang Y, Zhang Y, Sun X (2022) Damo-yolo: A report on real-time object detection design. arXiv preprint arXiv:2211.15444

Download references

Acknowledgements

This work was supported in part by the Natural Science Foundation of China under Grant 62266046, the Natural Science Foundation of Jilin Province, China, under Grant YDZJ202201ZYTS603, and the Natural Science Foundation of Jilin Provincial Department of Education, China, under Grant JJKH20230281KJ.

Author information

Weina Wang and Shuangyong Li contributed equally to this work.

Authors and Affiliations

College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin, 132000, China
Weina Wang & Shuangyong Li
School of Network Security and Information technology, YiLi Normal University, Yining, 835000, China
Huxidan Jumahong

Authors

Weina Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuangyong Li
View author publications
You can also search for this author in PubMed Google Scholar
Huxidan Jumahong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weina Wang.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, W., Li, S. & Jumahong, H. Towards more accurate object detection via encoding reinforcement and multi-channel enhancement. Appl Intell 55, 212 (2025). https://doi.org/10.1007/s10489-024-06200-8

Download citation

Accepted: 13 December 2024
Published: 23 December 2024
DOI: https://doi.org/10.1007/s10489-024-06200-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards more accurate object detection via encoding reinforcement and multi-channel enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CR-FPN: channel relation feature pyramid network for object detection

Object detection network pruning with multi-task information fusion

Object Detection by Combining Deep Dilated Convolutions Network and Light-Weight Network

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Towards more accurate object detection via encoding reinforcement and multi-channel enhancement

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CR-FPN: channel relation feature pyramid network for object detection

Object detection network pruning with multi-task information fusion

Object Detection by Combining Deep Dilated Convolutions Network and Light-Weight Network

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation