Skip to main content
Log in

Automatic label assignment object detection mehtod on only one feature map

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Most deep learning-based object detection methods are proposed based on multi-level feature environments. Although some researchers have tried to detect on one-level features, where multiple feature maps are utilized. In this paper, we aim to propose a novel anchor-free object detection approach with an automatic label assignment strategy on only one feature map. The proposed method follows the main idea of AutoAssign to achieve the label assignment strategy. However, to make the strategy work appropriately in one feature map environment, several modifications have been made. A prior knowledge fusion method called ‘Center Weighting Fusion’ is proposed for label assignment strategy, where the Gaussian mixture model function is applied to calculate the weights of each object on one feature map. By doing so, some objects that are close to each other, whose weights will be merged and generate some points (Recheck Points) that are shared with multiple objects. For those ‘Recheck Points,’ the detector will judge how many objects share this point and generate a corresponding number of different-sized proposals. In detecting different-sized objects, we propose a ‘Uniform Detection’ method to limit each point’s regression distance according to the target’s category. A large number of experimental data show that the proposed method presents competitive detection accuracy with normal anchor-free detectors (43.8% mAP), while it is smaller (30% smaller) and faster (50% better).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)

  2. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

  3. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)

  4. Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)

  5. Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)

  6. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. arXiv preprint arXiv:2005.12872 (2020)

  7. Chen, Q., Wang, Y., Yang, T., Zhang, X., Sun, J.: You only look one-level feature. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13034–13043 (2021)

  8. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. arXiv preprint arXiv:1912.02424 (2019)

  9. Zhu, B., Wang, J., Jiang, Z et al.: AutoAssign: differentiable label assignment for dense object detection. arXiv preprint arXIv: arXiv:2007.03496 (2020)

  10. Lin, T., Maire, M., Belongie, S et al: Microsoft coco: common objects in context. In: The European Conference on Computer Vision (2014)

  11. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results

  12. Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results

  13. Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, Xiangyu, Li, J., Sun, J.: Objects365: a large-scale, high-quality dataset for object detection. In: The IEEE International Conference on Computer Vision (2019)

  14. Lin, Y., Sun, H., Liu, N., Bian, Y., Cen, J., Zhou, H.: A lightweight multi-scale context network for salient object detection in optical remote sensing images. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 238–244 (2022)

  15. Tu, Z., Wang, C., Li, C., Fan, M., Zhao, H., Luo, B.: ORSI salient object detection via multiscale joint region and boundary model. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2022)

    Google Scholar 

  16. Cong, R., Zhang, Y., Fang, L., Li, J., Zhao, Y., Kwong, S.: RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022)

    Article  Google Scholar 

  17. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)

  18. Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L et al.: Swin transformer v2: scaling up capacity and resolution. arXiv preprint arXiv:2111.09883 (2021)

  19. Yuan, L., Chen, D., Chen, Y., Noel, et al.: Florence: a new foundation model for computer vision. arXiv preprint arXiv:2111.11432 (2021)

  20. Zhang, H., Li, F., Liu, S.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv eprints, https://doi.org/10.48550/arXiv.2203.03605 (2022)

  21. Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022 (2021)

  22. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)

  23. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)

  24. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)

  25. Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)

  26. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

  27. Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)

  28. Bochkovskiy, A., C Wang, Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)

  29. Wang, C., Bochkovskiy, A., Mark Liao, H.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)

  30. Chen, H., Wang, Y., Guo, T., Xu, C.: Pre-trained image processing transformer. arXiv:2012.00364 (2020)

  31. Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929 (2020)

  32. Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A.: Training data-efficient image transformers and distillation through attention. arXiv:2012.12877 (2020)

  33. Wang, W., Xie, E., Li, X., Fan, D., Song, K.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. arXiv:2102.12122 (2021)

  34. Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In The IEEE Conference on Computer Vision and Pattern Recognition, (2019)

  35. Yang, T., Zhang, X., Li, Z., Zhang, W., Sun, J.: Metaanchor: learning to detect objects with customized anchors. In: Advances in Neural Information Processing Systems (2018)

  36. Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (2019)

  37. Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. arXiv preprint arXiv:1911.12448 (2019)

  38. Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems (2019)

  39. Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. arXiv preprint arXiv:1912.05086 (2019)

  40. Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. arXiv preprint arXiv:2007.08103 (2020)

  41. Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)

  42. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  43. Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: The IEEE Conference on Computer Vision and Pattern Recognition (2009)

  44. He, K., Girshick, R., Dollar, P.: Rethinking imagenet pre-training. In: The IEEE International Conference on Computer Vision (2019)

  45. Zhou, X., Koltun, V., Krähenbühl, P.: Probabilistic two-stage detection. arXiv Preprint arXiv:2103.07461 (2021)

  46. Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting spatial attention design in vision transformers. arXiv preprint arXiv:2104.13840 (2021)

  47. Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021)

  48. Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dndetr: accelerate DETR training by introducing query denoising. arXiv preprint arXiv:2203.01305 (2022)

  49. Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss V2: learning reliable localization quality estimation for dense object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11627–11636 (2021)

  50. Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J.: Borderdet: border feature for dense object detection. In: European Conference on Computer Vision, Springer, pp. 549–564 (2020)

  51. Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)

  52. Dai, X., Chen, Y., Xiao, B., Chen, D., Liu, M., Yuan, L., Zhang, L.: Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7373–7382 (2021)

  53. Xie, S., Girshick, R.B., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)

  54. Yang, J., Li, C., Gao, J.: Focal modulation networks. arXiv preprint, arXiv: 2203.11926 (2022)

Download references

Acknowledgements

This work was supported in part by the Sichuan Science and Technology Program under Grant 2023NSFSC0503, and the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tingsong Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ma, T., Huang, Z., Yang, N. et al. Automatic label assignment object detection mehtod on only one feature map. Machine Vision and Applications 35, 2 (2024). https://doi.org/10.1007/s00138-023-01481-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01481-4

Keywords

Navigation