research-article

An Object Detection Algorithm Combining FPN Structure With DETR

Authors:
Nan Xiang

Liangjiang International College, Chongqing University of Technology, China

Liangjiang International College, Chongqing University of Technology, China
View Profile

,
Chuanzhong Pan

Liangjiang International College, Chongqing University of Technology, China

Liangjiang International College, Chongqing University of Technology, China
View Profile

,
Xiaozhao Li

Liangjiang International College, Chongqing University of Technology, China

Liangjiang International College, Chongqing University of Technology, China
View Profile

ICCCV '21: Proceedings of the 4th International Conference on Control and Computer VisionAugust 2021Pages 57–63https://doi.org/10.1145/3484274.3484284

Published:23 November 2021Publication History

ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision

Pages 57–63

ABSTRACT

In order to solve the problem of low detection accuracy of the DETR model for small and medium objects, an object detection algorithm with improved feature extraction combined with FPN structure combined with DETR is proposed. This method first extracts features from the original image through the improved Darknet53 network. In this process, the 104*104 size feature map after the first residual error in the second stage is additionally output as a fourth-scale feature map. Combine this feature map with the feature maps output from the original 3 stages to form 4 feature map outputs of different scales. Secondly, it uses FPN to down-sample and up-sample the feature maps of 4 scales, and to merge them to output 52*52 scales. Then, the feature map and the positional encoding are combined and input into the Transformer to obtain the data, and the category and position information of the predicted object are output through FFNs. On the COCO2017 data set, the accuracy has been improved compared with other models.

References

Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.Google ScholarDigital Library
Carion N, Massa F, Synnaeve G, End-to-End Object Detection with Transformers[C]. 16th European Conference on Computer Vision, ECCV 2020, August 23, 2020 - August 28, 2020, 2020: 213-229.Google ScholarDigital Library
Lin T-Y, Dollar P, Girshick R, Feature pyramid networks for object detection[C]. 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21, 2017 - July 26, 2017, 2017: 936-944.Google ScholarCross Ref
Girshick R, Donahue J, Darrell T, Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.Google Scholar
Cai Z, Vasconcelos N. Cascade R-CNN: Delving into High Quality Object Detection[C]. 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, June 18, 2018 - June 22, 2018, 2018: 6154-6162.Google ScholarCross Ref
Bochkovskiy A, Wang C-Y, Liao H-Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.Google Scholar
Redmon J, Divvala S, Girshick R, You only look once: Unified, real-time object detection[C]. 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, June 26, 2016 - July 1, 2016, 2016: 779-788.Google ScholarCross Ref
Redmon J, Farhadi A. YOLO9000: Better, faster, stronger[C]. 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21, 2017 - July 26, 2017, 2017: 6517-6525.Google Scholar
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.Google Scholar
Lin T-Y, Goyal P, Girshick R, Focal loss for dense object detection[C]. Proceedings of the IEEE international conference on computer vision, 2017: 2980-2988.Google Scholar
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]. International Conference on Machine Learning, 2019: 6105-6114.Google Scholar
Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. arXiv preprint arXiv:1706.03762, 2017.Google Scholar
He K, Zhang X, Ren S, Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.Google Scholar
Ren S, He K, Girshick R, Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.Google Scholar
Rezatofighi H, Tsoi N, Gwak J, Generalized intersection over union: A metric and a loss for bounding box regression[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.Google Scholar

Index Terms

An Object Detection Algorithm Combining FPN Structure With DETR
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
  2. Machine learning
    1. Machine learning algorithms

Index terms have been assigned to the content through auto-classification.

Recommendations

Adaptive learning feature pyramid for object detection

Inconsistent detection performance for objects of different scales lies in many state‐of‐the‐art object detection models. The feature pyramid network (FPN) alleviates this problem by fusing multi‐scale feature maps through a top‐down path. However, the ...
Read More
CCA-FPN: Channel and content adaptive object detection
Abstract
Feature pyramid network (FPN) is a typical detector commonly for solving the issue of object detection at different scales. However, the lateral connections in FPN lead to the loss of feature information due to the reduction of feature channels. ...
Highlights
- Targets of different scales often use feature pyramid networks for hierarchical detection.
- Using deep learning for object detection, feature enhancement is beneficial for improving detection performance.
- When performing feature ...
Read More
CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion
Abstract
Feature pyramid network (FPN) is a typical structure in object detection. It can improve the accuracy of detection results by fusing feature information at different resolutions and enhancing the expression ability of different levels of features. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision
August 2021
207 pages
ISBN:9781450390477
DOI:10.1145/3484274

Copyright © 2021 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 November 2021
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Darknet53
Detection Transformer
Feature Pyramid Network
Object Detection
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 140
  Total Downloads
- Downloads (Last 12 months)39
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

An Object Detection Algorithm Combining FPN Structure With DETR

ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive learning feature pyramid for object detection

CCA-FPN: Channel and content adaptive object detection

CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

An Object Detection Algorithm Combining FPN Structure With DETR

ICCCV '21: Proceedings of the 4th International Conference on Control and Computer Vision

ABSTRACT

References

Cited By

Index Terms

Recommendations

Adaptive learning feature pyramid for object detection

CCA-FPN: Channel and content adaptive object detection

CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media