Skip to main content

Advertisement

Log in

AFRNet: adaptive feature refinement network

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

In the domain of computer vision, object detection is a fundamental task, aimed at accurately identifying and localizing objects of various sizes within images. While existing models such as You Only Look Once, Adaptive Training Sample Selection, and Task-aligned One-stage Object Detection have made breakthroughs in this field, they still exhibit deficiencies in information fusion within their neck structure. To overcome these limitations, we have designed an innovative model architecture known as Adaptive Feature Refinement Network (AFRNet). The model, on one hand, discards the conventional Feature Pyramid Network structure and designs a novel neck structure that incorporates the structures of Scale Sequence Feature Fusion (SSFF) model and the Gather-and-Distribute (GD) mechanism. Through experimentation, it has been demonstrated that the SSFF method can further enhance the multi-scale feature fusion of the GD mechanism, thereby improving the performance of the target detection task. On the other hand, to address the constraints of existing models in simulating geometric transformations, We have designed an advanced variable convolution structure called Attentive Deformable ConvNet. This structure integrates an improved attention mechanism, which allows for more precise capture of key features in images. Extensive experiments conducted on the MS-COCO dataset have validated the effectiveness of our model. In single-model, single-scale testing, our model achieved an Average Precision (AP) of 51.8%, a result that underscores a significant enhancement in object detection performance and confirms the efficacy of our model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available in [15] at http://images.cocodataset.org/zips/train2017.zip,

http://images.cocodataset.org/zips/val2017.zip.

References

  1. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. IEEE Transact. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)

    Article  Google Scholar 

  2. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask r-cnn 2961–2969 (2017)

  3. Ghiasi, G., Lin, T.-Y., Le, Q.V.: Nas-fpn: learning scalable feature pyramid architecture for object detection, 7036–7045 (2019)

  4. Chen, Y., et al.: Detnas: backbone search for object detection. Adv. Neural Inform. Process. Syst.32 (2019)

  5. Guo, J., et al.: Hit-detector: hierarchical trinity architecture search for object detection, 11405–11414 (2020)

  6. Duan, K., et al.: Centernet: keypoint triplets for object detection 6569–6578 (2019)

  7. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: a simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 44, 1922–1933 (2020)

    Google Scholar 

  8. Ge, Z., Liu, S., Wang, F., Li, Z., Sun, J.: Yolox: exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021)

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, 770–778 (2016)

  10. Koonce, B., Koonce, B.: Efficientnet. Convolutional neural networks with swift for Tensorflow: image recognition and dataset categorization 109–123 (2021)

  11. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger, 7263–7271 (2017)

  12. Liu, W., et al.: Ssd: single shot multibox detector, 21–37 (Springer, 2016)

  13. Lin, T.-Y., et al.: Feature pyramid networks for object detection, 2117–2125 (2017)

  14. Wang, R., et al.: Dcn v2: improved deep & cross network and practical lessons for web-scale learning to rank systems, 1785–1797 (2021)

  15. Lin, T.-Y., et al.: Microsoft coco: common objects in context, 740–755 (Springer, 2014)

  16. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z., Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection, 9759–9768 (2020)

  17. Feng, C., Zhong, Y., Gao, Y., Scott, M.R., Huang, W.: Tood: Task-aligned one-stage object detection, 3490–3499 (IEEE Computer Society, 2021)

  18. Li, X., et al.: Generalized focal loss: Learning qualified and distributed bounding boxes for dense object detection. Adv. Neural. Inf. Process. Syst. 33, 21002–21012 (2020)

    Google Scholar 

  19. Kim, K., Lee, H.S.: Probabilistic anchor assignment with iou prediction for object detection, 355–371 (Springer, 2020)

  20. Sermanet, P., et al.: Overfeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)

  21. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection, 779–788 (2016)

  22. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection, 2980–2988 (2017)

  23. Duan, K., et al.: Keypoint triplets for object detection, 27–32 (2019)

  24. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints, 734–750 (2018)

  25. Zhou, X., Wang, D., Krähenbühl, P.: Objects as points. arXiv preprint arXiv:1904.07850 (2019)

  26. Kong, T., et al.: Foveabox: beyound anchor-based object detection. IEEE Trans. Image Process. 29, 7389–7398 (2020)

    Article  Google Scholar 

  27. Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation, 8759–8768 (2018)

  28. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection, 10781–10790 (2020)

  29. Quan, Y., Zhang, D., Zhang, L., Tang, J.: Centralized feature pyramid for object detection. IEEE Transactions on Image Processing (2023)

  30. Yang, G., et al.: Afpn: asymptotic feature pyramid network for object detection, 2184–2189 (IEEE, 2023)

  31. Kang, M., Ting, C.-M., Ting, F.F., Phan, R.C.-W.: Asf-yolo: a novel yolo model with attentional scale sequence fusion for cell instance segmentation. arXiv preprint arXiv:2312.06458 (2023)

  32. Wang, C., et al.: Gold-yolo: efficient object detector via gather-and-distribute mechanism. Adv. Neural Inform. Process. Syst.36 (2024)

  33. Dai, J., et al.: Deformable convolutional networks, 764–773 (2017)

  34. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inform. Process. Syst

Download references

Funding

This research was funded by National Natural Science Foundation of China under Grant NO:12061066.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jilong Zhang.

Ethics declarations

Confict of interest

I (corresponding author) state that there is no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Yang, Y., Liu, J. et al. AFRNet: adaptive feature refinement network. SIViP 18, 7779–7788 (2024). https://doi.org/10.1007/s11760-024-03427-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03427-3

Keywords