Automatic label assignment object detection mehtod on only one feature map

Ma, Tingsong; Huang, Zengxi; Yang, Nijing; Zhu, Changyu; Deng, Ping

doi:10.1007/s00138-023-01481-4

Automatic label assignment object detection mehtod on only one feature map

Original Paper
Published: 07 November 2023

Volume 35, article number 2, (2024)
Cite this article

Machine Vision and Applications Aims and scope Submit manuscript

Tingsong Ma ORCID: orcid.org/0000-0001-7874-6126¹,
Zengxi Huang¹,
Nijing Yang¹,
Changyu Zhu¹ &
…
Ping Deng¹

257 Accesses
Explore all metrics

Abstract

Most deep learning-based object detection methods are proposed based on multi-level feature environments. Although some researchers have tried to detect on one-level features, where multiple feature maps are utilized. In this paper, we aim to propose a novel anchor-free object detection approach with an automatic label assignment strategy on only one feature map. The proposed method follows the main idea of AutoAssign to achieve the label assignment strategy. However, to make the strategy work appropriately in one feature map environment, several modifications have been made. A prior knowledge fusion method called ‘Center Weighting Fusion’ is proposed for label assignment strategy, where the Gaussian mixture model function is applied to calculate the weights of each object on one feature map. By doing so, some objects that are close to each other, whose weights will be merged and generate some points (Recheck Points) that are shared with multiple objects. For those ‘Recheck Points,’ the detector will judge how many objects share this point and generate a corresponding number of different-sized proposals. In detecting different-sized objects, we propose a ‘Uniform Detection’ method to limit each point’s regression distance according to the target’s category. A large number of experimental data show that the proposed method presents competitive detection accuracy with normal anchor-free detectors (43.8% mAP), while it is smaller (30% smaller) and faster (50% better).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection

Weakly Supervised Localization Using Deep Feature Maps

References

Lin, T., Dollar, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018)
Lin, T., Goyal, P., Girshick, R., He, K., Dollar, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017)
Tian, Z., Shen, C., Chen, H., He, T.: Fcos: fully convolutional one-stage object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 9627–9636 (2019)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. arXiv preprint arXiv:2005.12872 (2020)
Chen, Q., Wang, Y., Yang, T., Zhang, X., Sun, J.: You only look one-level feature. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 13034–13043 (2021)
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. arXiv preprint arXiv:1912.02424 (2019)
Zhu, B., Wang, J., Jiang, Z et al.: AutoAssign: differentiable label assignment for dense object detection. arXiv preprint arXIv: arXiv:2007.03496 (2020)
Lin, T., Maire, M., Belongie, S et al: Microsoft coco: common objects in context. In: The European Conference on Computer Vision (2014)
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC2007) Results
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results
Shao, S., Li, Z., Zhang, T., Peng, C., Yu, G., Zhang, Xiangyu, Li, J., Sun, J.: Objects365: a large-scale, high-quality dataset for object detection. In: The IEEE International Conference on Computer Vision (2019)
Lin, Y., Sun, H., Liu, N., Bian, Y., Cen, J., Zhou, H.: A lightweight multi-scale context network for salient object detection in optical remote sensing images. In: 2022 26th International Conference on Pattern Recognition (ICPR), pp. 238–244 (2022)
Tu, Z., Wang, C., Li, C., Fan, M., Zhao, H., Luo, B.: ORSI salient object detection via multiscale joint region and boundary model. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2022)
Google Scholar
Cong, R., Zhang, Y., Fang, L., Li, J., Zhao, Y., Kwong, S.: RRNet: relational reasoning network with parallel multiscale attention for salient object detection in optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 60, 1–11 (2022)
Article Google Scholar
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020)
Liu, Z., Hu, H., Lin, Y., Yao, Z., Xie, Z., Wei, Y., Ning, J., Cao, Y., Zhang, Z., Dong, L et al.: Swin transformer v2: scaling up capacity and resolution. arXiv preprint arXiv:2111.09883 (2021)
Yuan, L., Chen, D., Chen, Y., Noel, et al.: Florence: a new foundation model for computer vision. arXiv preprint arXiv:2111.11432 (2021)
Zhang, H., Li, F., Liu, S.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv eprints, https://doi.org/10.48550/arXiv.2203.03605 (2022)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y.: Swin transformer: hierarchical vision transformer using shifted windows. In: International Conference on Computer Vision, pp. 10012–10022 (2021)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, pp. 379–387 (2016)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Redmon, J., Farhadi, A.: Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271 (2017)
Bochkovskiy, A., C Wang, Liao, H.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Wang, C., Bochkovskiy, A., Mark Liao, H.: YOLOv7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. arXiv preprint arXiv:2207.02696 (2022)
Chen, H., Wang, Y., Guo, T., Xu, C.: Pre-trained image processing transformer. arXiv:2012.00364 (2020)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X et al.: An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929 (2020)
Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A.: Training data-efficient image transformers and distillation through attention. arXiv:2012.12877 (2020)
Wang, W., Xie, E., Li, X., Fan, D., Song, K.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. arXiv:2102.12122 (2021)
Wang, J., Chen, K., Yang, S., Loy, C.C., Lin, D.: Region proposal by guided anchoring. In The IEEE Conference on Computer Vision and Pattern Recognition, (2019)
Yang, T., Zhang, X., Li, Z., Zhang, W., Sun, J.: Metaanchor: learning to detect objects with customized anchors. In: Advances in Neural Information Processing Systems (2018)
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition (2019)
Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. arXiv preprint arXiv:1911.12448 (2019)
Zhang, X., Wan, F., Liu, C., Ji, R., Ye, Q.: Freeanchor: learning to match anchors for visual object detection. In: Advances in Neural Information Processing Systems (2019)
Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. arXiv preprint arXiv:1912.05086 (2019)
Kim, K., Lee, H.S.: Probabilistic anchor assignment with IOU prediction for object detection. arXiv preprint arXiv:2007.08103 (2020)
Law, H., Deng, J.: Cornernet: detecting objects as paired keypoints. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 734–750 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: The IEEE Conference on Computer Vision and Pattern Recognition (2009)
He, K., Girshick, R., Dollar, P.: Rethinking imagenet pre-training. In: The IEEE International Conference on Computer Vision (2019)
Zhou, X., Koltun, V., Krähenbühl, P.: Probabilistic two-stage detection. arXiv Preprint arXiv:2103.07461 (2021)
Chu, X., Tian, Z., Wang, Y., Zhang, B., Ren, H., Wei, X., Xia, H., Shen, C.: Twins: Revisiting spatial attention design in vision transformers. arXiv preprint arXiv:2104.13840 (2021)
Wang, W., Xie, E., Li, X., Fan, D., Song, K., Liang, D., Lu, T., Luo, P., Shao, L.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions. arXiv preprint arXiv:2102.12122 (2021)
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dndetr: accelerate DETR training by introducing query denoising. arXiv preprint arXiv:2203.01305 (2022)
Li, X., Wang, W., Hu, X., Li, J., Tang, J., Yang, J.: Generalized focal loss V2: learning reliable localization quality estimation for dense object detection. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 2021, pp. 11627–11636 (2021)
Qiu, H., Ma, Y., Li, Z., Liu, S., Sun, J.: Borderdet: border feature for dense object detection. In: European Conference on Computer Vision, Springer, pp. 549–564 (2020)
Tan, M., Pang, R., Le, Q.V.: Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Dai, X., Chen, Y., Xiao, B., Chen, D., Liu, M., Yuan, L., Zhang, L.: Dynamic head: unifying object detection heads with attentions. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7373–7382 (2021)
Xie, S., Girshick, R.B., Dollar, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: CVPR (2017)
Yang, J., Li, C., Gao, J.: Focal modulation networks. arXiv preprint, arXiv: 2203.11926 (2022)

Download references

Acknowledgements

This work was supported in part by the Sichuan Science and Technology Program under Grant 2023NSFSC0503, and the Opening Project of Intelligent Policing Key Laboratory of Sichuan Province.

Author information

Authors and Affiliations

XiHua University, Chengdu, Sichuan, China
Tingsong Ma, Zengxi Huang, Nijing Yang, Changyu Zhu & Ping Deng

Authors

Tingsong Ma
View author publications
You can also search for this author in PubMed Google Scholar
Zengxi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Nijing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Changyu Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Ping Deng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tingsong Ma.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Ma, T., Huang, Z., Yang, N. et al. Automatic label assignment object detection mehtod on only one feature map. Machine Vision and Applications 35, 2 (2024). https://doi.org/10.1007/s00138-023-01481-4

Download citation

Received: 26 April 2023
Revised: 14 August 2023
Accepted: 04 October 2023
Published: 07 November 2023
DOI: https://doi.org/10.1007/s00138-023-01481-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automatic label assignment object detection mehtod on only one feature map

Abstract

Access this article

Similar content being viewed by others

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection

Weakly Supervised Localization Using Deep Feature Maps

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Automatic label assignment object detection mehtod on only one feature map

Abstract

Access this article

Similar content being viewed by others

DFL-Net: Effective Object Detection via Distinguishable Feature Learning

RFLA: Gaussian Receptive Field Based Label Assignment for Tiny Object Detection

Weakly Supervised Localization Using Deep Feature Maps

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation