Abstract
In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.





Similar content being viewed by others
Data availability
The data that support the findings of this study are openly available in [15] at http://images.cocodataset.org/zips/train2017.zip,
References
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transact. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn 2961–2969 (2017)
Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection, 7036–7045 (2019)
Chen, Y., et al.: Detnas: backbone search for object detection. Adv. Neural Inform. Process. Syst.32 (2019)
Guo, J., et al.: Hit-detector: hierarchical trinity architecture search for object detection, 11405–11414 (2020)
Duan, K., et al.: Centernet: keypoint triplets for object detection 6569–6578 (2019)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: a simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1922–1933 (2020)
Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, 770–778 (2016)
Koonce, B., Koonce, B.: Efficientnet. Convolutional neural networks with swift for Tensorflow: image recognition and dataset categorization 109–123 (2021)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger, 7263–7271 (2017)
Liu, W., et al.: Ssd: single shot multibox detector, 21–37 (Springer, 2016)
Lin, T.-Y., et al.: Feature pyramid networks for object detection, 2117–2125 (2017)
Wang, R., et al.: Dcn v2: improved deep & cross network and practical lessons for web-scale learning to rank systems, 1785–1797 (2021)
Lin, T.-Y., et al.: Microsoft coco: common objects in context, 740–755 (Springer, 2014)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z., Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection, 9759–9768 (2020)
Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection, 3490–3499 (IEEE Computer Society, 2021)
Li, X., et al.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)
Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection, 355–371 (Springer, 2020)
Sermanet, P., et al.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection, 779–788 (2016)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection, 2980–2988 (2017)
Duan, K., et al.: Keypoint triplets for object detection, 27–32 (2019)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints, 734–750 (2018)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)
Kong, T., et al.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation, 8759–8768 (2018)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection, 10781–10790 (2020)
Quan, Y., Zhang, D., Zhang, L., Tang, J.: Centralized feature pyramid for object detection. IEEE Transactions on Image Processing (2023)
Yang, G., et al.: Afpn: asymptotic feature pyramid network for object detection, 2184–2189 (IEEE, 2023)
Kang, M., Ting, C.-M., Ting, F.F., Phan, R.C.-W.: Asf-yolo: a novel yolo model with attentional scale sequence fusion for cell instance segmentation. arXiv preprint arXiv:2312.06458 (2023)
Wang, C., et al.: Gold-yolo: efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst.36 (2024)
Dai, J., et al.: Deformable convolutional networks, 764–773 (2017)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst
Funding
This research was funded by National Natural Science Foundation of China under Grant NO:12061066.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Confict of interest
I (corresponding author) state that there is no Conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Yang, Y., Liu, J. et al. AFRNet: adaptive feature refinement network. SIViP 18, 7779–7788 (2024). https://doi.org/10.1007/s11760-024-03427-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03427-3