skip to main content
10.1145/3641584.3641661acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaiprConference Proceedingsconference-collections
research-article

I-YOLO: Improved Progressive Feature Pyramid and Wise-IOU for Object Detection

Published: 14 June 2024 Publication History

Abstract

The YOLO algorithm, as the most typical representation of one-stage object detection methods, is based on deep neural networks for object recognition and localization, achieving real-time performance suitable for deployment in detection systems. YOLOV7 outperforms the previous YOLO series in accuracy and speed. However, its detection accuracy is still not very impressive. To address this problem, we propose an improved model (I-YOLO) based on YOLOV7. First, we propose a progressive feature pyramid network with a distillation module, which improves the efficiency of the model and at the same time can reduce the semantic gaps between non-adjacent layers, and further utilizes adaptive spatial fusion operations to alleviate the target information conflict problem when fusing features across layers. Second, we introduce the Wise-IoU loss function to optimize the model and improve the accuracy of the algorithm. Furthermore, we train I-YOLO from scratch only on the 2007 and 2012 datasets of PASCAL VOC without using any other datasets or pre-trained weights. Experimental results show that competitive results are achieved: 55.9% AP (74.7% AP50), and I-YOLO improves by about 2.5% over the baseline model (YOLOV7).

References

[1]
Girshick, Donahue, Darrell, and Malik. 2014. "Rich feature hierarchies for accurate object detection and semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition, 580-87.
[2]
Girshick. 2015. "Fast r-cnn." In Proceedings of the IEEE international conference on computer vision, 1440-48.
[3]
Ren, He, Girshick, and Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, 28.
[4]
Redmon, Divvala, Girshick, and Farhadi. 2016. "You only look once: Unified, real-time object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 779-88.
[5]
Bochkovskiy, Wang, and Liao. 2020. Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934.
[6]
Redmon, and Farhadi. 2018. Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.
[7]
Redmon, and Farhadi. 2017. "YOLO9000: better, faster, stronger." In Proceedings of the IEEE conference on computer vision and pattern recognition, 7263-71.
[8]
Lin, Dollár, Girshick, He, Hariharan, and Belongie. 2017. "Feature pyramid networks for object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2117-25.
[9]
Zheng, Wang, Ren, Liu, Ye, Hu, and Zuo. 2021. Enhancing geometric factors in model learning and inference for object detection and instance segmentation, IEEE transactions on cybernetics, 52, 8, 8574-86. https://doi.org/10.1109/TCYB.2021.3095305.
[10]
Tong, Chen, Xu, and Yu. 2023. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism, arXiv preprint arXiv:2301.10051.
[11]
Wang, Bochkovskiy, and Liao. 2023. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464-75.
[12]
Li, Peng, Yu, Zhang, Deng, and Sun. 2018. Detnet: A backbone network for object detection, arXiv preprint arXiv:1804.06215.
[13]
Gevorgyan. 2022. SIoU loss: More powerful learning for bounding box regression, arXiv preprint arXiv:2205.12740.
[14]
Wang, Liao, and Yeh. 2022. Designing network design strategies through gradient path analysis, arXiv preprint arXiv:2211.04800.
[15]
Boureau, Bach, LeCun, and Ponce. 2010. "Learning mid-level features for recognition." In 2010 IEEE computer society conference on computer vision and pattern recognition, 2559-66. IEEE.
[16]
Everingham, Van Gool, Williams, Winn, and Zisserman. 2010. The pascal visual object classes (voc) challenge, International journal of computer vision, 88, 303-38.
[17]
Ruder. 2016. An overview of gradient descent optimization algorithms, arXiv preprint arXiv:1609.04747.
[18]
Paszke, Gross, Chintala, Chanan, Yang, DeVito, Lin, Desmaison, Antiga, and Lerer. 2017. Automatic differentiation in pytorch.
[19]
Deng, and Wang. 2022. "SN-YOLO: Improved YOLOv5 with Softer-NMS and SIOU for Object Detection." In Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition, 326-30.
[20]
Zhu, Lyu, Wang, and Zhao. 2021. "TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios." In Proceedings of the IEEE/CVF international conference on computer vision, 2778-88.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition
September 2023
1540 pages
ISBN:9798400707674
DOI:10.1145/3641584
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2024

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. YOLOV7
  2. deep neural networks
  3. feature pyramid network, loss function
  4. object detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

AIPR 2023

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 35
    Total Downloads
  • Downloads (Last 12 months)35
  • Downloads (Last 6 weeks)7
Reflects downloads up to 05 Mar 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media