research-article

I-YOLO: Improved Progressive Feature Pyramid and Wise-IOU for Object Detection

Authors:

Qian WangAuthors Info & Claims

AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

Pages 517 - 522

https://doi.org/10.1145/3641584.3641661

Published: 14 June 2024 Publication History

Abstract

The YOLO algorithm, as the most typical representation of one-stage object detection methods, is based on deep neural networks for object recognition and localization, achieving real-time performance suitable for deployment in detection systems. YOLOV7 outperforms the previous YOLO series in accuracy and speed. However, its detection accuracy is still not very impressive. To address this problem, we propose an improved model (I-YOLO) based on YOLOV7. First, we propose a progressive feature pyramid network with a distillation module, which improves the efficiency of the model and at the same time can reduce the semantic gaps between non-adjacent layers, and further utilizes adaptive spatial fusion operations to alleviate the target information conflict problem when fusing features across layers. Second, we introduce the Wise-IoU loss function to optimize the model and improve the accuracy of the algorithm. Furthermore, we train I-YOLO from scratch only on the 2007 and 2012 datasets of PASCAL VOC without using any other datasets or pre-trained weights. Experimental results show that competitive results are achieved: 55.9% AP (74.7% AP50), and I-YOLO improves by about 2.5% over the baseline model (YOLOV7).

References

[1]

Girshick, Donahue, Darrell, and Malik. 2014. "Rich feature hierarchies for accurate object detection and semantic segmentation." In Proceedings of the IEEE conference on computer vision and pattern recognition, 580-87.

[2]

Girshick. 2015. "Fast r-cnn." In Proceedings of the IEEE international conference on computer vision, 1440-48.

[3]

Ren, He, Girshick, and Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks, Advances in neural information processing systems, 28.

[4]

Redmon, Divvala, Girshick, and Farhadi. 2016. "You only look once: Unified, real-time object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 779-88.

[5]

Bochkovskiy, Wang, and Liao. 2020. Yolov4: Optimal speed and accuracy of object detection, arXiv preprint arXiv:2004.10934.

[6]

Redmon, and Farhadi. 2018. Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767.

[7]

Redmon, and Farhadi. 2017. "YOLO9000: better, faster, stronger." In Proceedings of the IEEE conference on computer vision and pattern recognition, 7263-71.

[8]

Lin, Dollár, Girshick, He, Hariharan, and Belongie. 2017. "Feature pyramid networks for object detection." In Proceedings of the IEEE conference on computer vision and pattern recognition, 2117-25.

[9]

Zheng, Wang, Ren, Liu, Ye, Hu, and Zuo. 2021. Enhancing geometric factors in model learning and inference for object detection and instance segmentation, IEEE transactions on cybernetics, 52, 8, 8574-86. https://doi.org/10.1109/TCYB.2021.3095305.

[10]

Tong, Chen, Xu, and Yu. 2023. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism, arXiv preprint arXiv:2301.10051.

[11]

Wang, Bochkovskiy, and Liao. 2023. "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors." In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7464-75.

[12]

Li, Peng, Yu, Zhang, Deng, and Sun. 2018. Detnet: A backbone network for object detection, arXiv preprint arXiv:1804.06215.

[13]

Gevorgyan. 2022. SIoU loss: More powerful learning for bounding box regression, arXiv preprint arXiv:2205.12740.

[14]

Wang, Liao, and Yeh. 2022. Designing network design strategies through gradient path analysis, arXiv preprint arXiv:2211.04800.

[15]

Boureau, Bach, LeCun, and Ponce. 2010. "Learning mid-level features for recognition." In 2010 IEEE computer society conference on computer vision and pattern recognition, 2559-66. IEEE.

[16]

Everingham, Van Gool, Williams, Winn, and Zisserman. 2010. The pascal visual object classes (voc) challenge, International journal of computer vision, 88, 303-38.

[17]

Ruder. 2016. An overview of gradient descent optimization algorithms, arXiv preprint arXiv:1609.04747.

[18]

Paszke, Gross, Chintala, Chanan, Yang, DeVito, Lin, Desmaison, Antiga, and Lerer. 2017. Automatic differentiation in pytorch.

[19]

Deng, and Wang. 2022. "SN-YOLO: Improved YOLOv5 with Softer-NMS and SIOU for Object Detection." In Proceedings of the 2022 5th International Conference on Artificial Intelligence and Pattern Recognition, 326-30.

[20]

Zhu, Lyu, Wang, and Zhao. 2021. "TPH-YOLOv5: Improved YOLOv5 based on transformer prediction head for object detection on drone-captured scenarios." In Proceedings of the IEEE/CVF international conference on computer vision, 2778-88.

Recommendations

Research on improved algorithm of object detection based on feature pyramid

To solve the low detection accuracy of SSD for the small size object, this paper proposed an improved algorithm of SSD object detection based on the feature pyramid (FP-SSD). In the deep convolutional neural network, the high-level features contain well ...
Small Object Detection Using Deep Feature Pyramid Networks
Advances in Multimedia Information Processing – PCM 2018
Abstract
Recent studies have achieved great progress on the object detection in terms of accuracy and speed using convolutional neural networks (CNNs). However, no matter the one-stage detector or the two-stage detector, usually it is still a challenging ...
Improving Deep Object Detection Backbone with Feature Layers
Computational Science – ICCS 2021
Abstract
Deep neural networks are the frontier in object detection, a key modern computing task. The dominant methods involve two-stage deep networks that heavily rely on features extracted by the backbone in the first stage. In this study, we propose an ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AIPR '23: Proceedings of the 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

September 2023

1540 pages

ISBN:9798400707674

DOI:10.1145/3641584

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

AIPR 2023

AIPR 2023: 2023 6th International Conference on Artificial Intelligence and Pattern Recognition

September 22 - 24, 2023

Xiamen, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
35
Total Downloads

Downloads (Last 12 months)35
Downloads (Last 6 weeks)7

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten