ABSTRACT
In order to solve the problem of low detection accuracy of the DETR model for small and medium objects, an object detection algorithm with improved feature extraction combined with FPN structure combined with DETR is proposed. This method first extracts features from the original image through the improved Darknet53 network. In this process, the 104*104 size feature map after the first residual error in the second stage is additionally output as a fourth-scale feature map. Combine this feature map with the feature maps output from the original 3 stages to form 4 feature map outputs of different scales. Secondly, it uses FPN to down-sample and up-sample the feature maps of 4 scales, and to merge them to output 52*52 scales. Then, the feature map and the positional encoding are combined and input into the Transformer to obtain the data, and the category and position information of the predicted object are output through FFNs. On the COCO2017 data set, the accuracy has been improved compared with other models.
- Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90.Google ScholarDigital Library
- Carion N, Massa F, Synnaeve G, End-to-End Object Detection with Transformers[C]. 16th European Conference on Computer Vision, ECCV 2020, August 23, 2020 - August 28, 2020, 2020: 213-229.Google ScholarDigital Library
- Lin T-Y, Dollar P, Girshick R, Feature pyramid networks for object detection[C]. 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21, 2017 - July 26, 2017, 2017: 936-944.Google ScholarCross Ref
- Girshick R, Donahue J, Darrell T, Rich feature hierarchies for accurate object detection and semantic segmentation[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2014: 580-587.Google Scholar
- Cai Z, Vasconcelos N. Cascade R-CNN: Delving into High Quality Object Detection[C]. 31st Meeting of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, June 18, 2018 - June 22, 2018, 2018: 6154-6162.Google ScholarCross Ref
- Bochkovskiy A, Wang C-Y, Liao H-Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.Google Scholar
- Redmon J, Divvala S, Girshick R, You only look once: Unified, real-time object detection[C]. 29th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, June 26, 2016 - July 1, 2016, 2016: 779-788.Google ScholarCross Ref
- Redmon J, Farhadi A. YOLO9000: Better, faster, stronger[C]. 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, July 21, 2017 - July 26, 2017, 2017: 6517-6525.Google Scholar
- Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.Google Scholar
- Lin T-Y, Goyal P, Girshick R, Focal loss for dense object detection[C]. Proceedings of the IEEE international conference on computer vision, 2017: 2980-2988.Google Scholar
- Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks[C]. International Conference on Machine Learning, 2019: 6105-6114.Google Scholar
- Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. arXiv preprint arXiv:1706.03762, 2017.Google Scholar
- He K, Zhang X, Ren S, Deep residual learning for image recognition[C]. Proceedings of the IEEE conference on computer vision and pattern recognition, 2016: 770-778.Google Scholar
- Ren S, He K, Girshick R, Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(6): 1137-1149.Google Scholar
- Rezatofighi H, Tsoi N, Gwak J, Generalized intersection over union: A metric and a loss for bounding box regression[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019: 658-666.Google Scholar
Index Terms
- An Object Detection Algorithm Combining FPN Structure With DETR
Recommendations
Adaptive learning feature pyramid for object detection
Inconsistent detection performance for objects of different scales lies in many state‐of‐the‐art object detection models. The feature pyramid network (FPN) alleviates this problem by fusing multi‐scale feature maps through a top‐down path. However, the ...
CCA-FPN: Channel and content adaptive object detection
AbstractFeature pyramid network (FPN) is a typical detector commonly for solving the issue of object detection at different scales. However, the lateral connections in FPN lead to the loss of feature information due to the reduction of feature channels. ...
Highlights- Targets of different scales often use feature pyramid networks for hierarchical detection.
- Using deep learning for object detection, feature enhancement is beneficial for improving detection performance.
- When performing feature ...
CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion
AbstractFeature pyramid network (FPN) is a typical structure in object detection. It can improve the accuracy of detection results by fusing feature information at different resolutions and enhancing the expression ability of different levels of features. ...
Comments