Skip to main content
Log in

ACKSNet: adaptive center keypoint selection for object detection

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Keypoint-based detectors generate a large number of false positives due to incorrect keypoint matching in the object detection task. In this paper, we propose an adaptive center keypoint selection method (ACKSNet) to address the false-positive drawback. We first roughly group detected corners by associative embeddings, which flexibly localize objects of various shapes and scales to obtain a large number of initial candidate proposals. Then, ACKSNet associates introduced extra center keypoints with corner pairs through the geometric method to add location information in the candidate regions. And it independently generates a threshold for each center keypoint according to their statistical characteristics to ensure the high quality of center keypoints. Furthermore, we enrich the scale information of the output feature maps by equipping the backbone network with dilated convolution modules. On the MS COCO dataset, our model achieves an AP of 44.5%, surpassing most existing anchor-free detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2Det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020). https://doi.org/10.1109/CVPR42600.2020.01150

  2. Fan, Q., Zhuo, W., Tang, C.-K., Tai, Y.-W.: Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4013–4022 (2020). https://doi.org/10.1109/CVPR42600.2020.00407

  3. Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063 (2019). https://doi.org/10.1109/ICCV.2019.00615

  4. Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020). https://doi.org/10.48550/arXiv.2004.10934

  5. Zhang, H., Hu, Z., Hao, R.: Joint information fusion and multi-scale network model for pedestrian detection. Vis. Comput. 37(8), 2433–2442 (2021). https://doi.org/10.1007/s00371-020-01997-0

    Article  Google Scholar 

  6. Chen, J., Wu, Q., Liu, D., Xu, T.: Foreground-background imbalance problem in deep object detectors: a review. In: 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 285–290 (2020). IEEE. https://doi.org/10.1109/MIPR49039.2020.00066

  7. Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. Int. J. Comput. Vis. 128(3), 642–656 (2020). https://doi.org/10.1007/s11263-019-01204-1

    Article  Google Scholar 

  8. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019). https://doi.org/10.1109/ICCV.2019.00667

  9. Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850–859 (2019). https://doi.org/10.1109/CVPR.2019.00094

  10. Saeidi, M., Arabsorkhi, A.: A novel backbone architecture for pedestrian detection based on the human visual system. Vis. Comput. 38(6), 2223–2237 (2022). https://doi.org/10.1007/s00371-021-02280-6

    Article  Google Scholar 

  11. Dong, Z., Li, G., Liao, Y., Wang, F., Ren, P., Qian, C.: CentripetalNet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10519–10528 (2020). https://doi.org/10.1109/CVPR42600.2020.01053

  12. Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 (2016). https://doi.org/10.48550/arXiv.1611.05424

  13. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016). Springer. https://doi.org/10.1007/978-3-319-46484-8_29

  14. Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017). https://doi.org/10.1109/CVPR.2017.106

  15. Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020). https://doi.org/10.1109/CVPR42600.2020.00978

  16. Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014). https://doi.org/10.1007/978-3-319-10602-1_48

  17. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308

  18. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243

  19. Ben Fredj, H., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with CNN. Vis. Comput. 37(2), 217–226 (2021). https://doi.org/10.1007/s00371-020-01794-9

    Article  Google Scholar 

  20. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031

    Article  Google Scholar 

  21. Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., Sun, J.: ThunderNet: towards real-time generic object detection on mobile devices. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6718–6727 (2019). https://doi.org/10.1109/ICCV.2019.00682

  22. Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021). https://doi.org/10.1109/ICCV48922.2021.00350

  23. Roy, K., Sahay, R.R.: A robust multi-scale deep learning approach for unconstrained hand detection aided by skin segmentation. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02157-8

    Article  Google Scholar 

  24. Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10588–10597 (2020). https://doi.org/10.1109/CVPR42600.2020.01060

  25. Wei, L., Cui, W., Hu, Z., Sun, H., Hou, S.: A single-shot multi-level feature reused neural network for object detection. Vis. Comput. 37(1), 133–142 (2021). https://doi.org/10.1007/s00371-019-01787-3

    Article  Google Scholar 

  26. Huang, X., Wang, X., Lv, W., Bai, X., Long, X., Deng, K., Dang, Q., Han, S., Liu, Q., Hu, X., et al.: PP-YOLOV2: a practical object detector. arXiv preprint arXiv:2104.10419 (2021). https://doi.org/10.48550/arXiv.2104.10419

  27. Zhang, T., Li, Z., Sun, Z., Zhu, L.: A fully convolutional anchor-free object detector. Vis. Comput. (2022). https://doi.org/10.1007/s00371-021-02357-2

    Article  Google Scholar 

  28. Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5

    Article  Google Scholar 

  29. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017). https://doi.org/10.1109/TPAMI.2018.2844175

  30. Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.324

  31. Liu, D., Cui, Y., Chen, Y., Zhang, J., Fan, B.: Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing 409, 1–11 (2020). https://doi.org/10.1016/j.neucom.2020.05.027

    Article  Google Scholar 

  32. Xiao, J., Xu, J., Tian, C., Han, P., You, L., Zhang, S.: A serial attention frame for multi-label waste bottle classification. Appl. Sci. 12(3), 1742 (2022). https://doi.org/10.3390/app12031742

    Article  Google Scholar 

  33. Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015). https://doi.org/10.48550/arXiv.1509.04874

  34. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91

  35. Tian, Z., Shen, C., Chen, H., He, T.: FCOS: a simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 5(99), 1 (2020). https://doi.org/10.1109/TPAMI.2020.3032166

    Article  Google Scholar 

  36. Liu, D., Cui, Y., Tan, W., Chen, Y.: SG-Net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9816–9825 (2021). https://doi.org/10.1109/CVPR46437.2021.00969

  37. Cui, Y., Yan, L., Cao, Z., Liu, D.: TF-Blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8138–8147 (2021). https://doi.org/10.1109/ICCV48922.2021.00803

  38. Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet++ for object detection. arXiv preprint arXiv:2204.08394 (2022). https://doi.org/10.48550/arXiv.2204.08394

  39. Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer. https://doi.org/10.1007/978-3-030-58452-8_13

  40. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: arXiv (2017). https://doi.org/10.48550/arXiv.1706.03762

  41. Huang, K., Tian, C., Su, J., Lin, J.C.-W.: Transformer-based cross reference network for video salient object detection. Pattern Recognit. Lett. 160, 122–127 (2022). https://doi.org/10.1016/j.patrec.2022.06.006

    Article  Google Scholar 

  42. Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.-Y.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022). https://doi.org/10.48550/arXiv.2203.03605

  43. Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: DAB-DETR: dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 (2022). https://doi.org/10.48550/arXiv.2201.12329

  44. Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: DN-DETR: accelerate DETR training by introducing query denoising. arXiv preprint arXiv:2203.01305 (2022). https://doi.org/10.48550/arXiv.2203.01305

  45. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90

  46. Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169

  47. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch (2017)

  48. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). https://doi.org/10.48550/arXiv.1412.6980

  49. Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS—improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017). https://doi.org/10.1109/ICCV.2017.593

  50. Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016). https://doi.org/10.48550/arXiv.1612.06851

  51. Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019). https://doi.org/10.1109/CVPR.2019.00091

  52. Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018). https://doi.org/10.1109/CVPR.2018.00442

  53. Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644

  54. Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7363–7372 (2019). https://doi.org/10.1109/CVPR.2019.00754

  55. Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018). https://doi.org/10.48550/arXiv.1804.02767

  56. Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. In: European Conference on Computer Vision, pp. 91–107 (2020). Springer. https://doi.org/10.1007/978-3-030-58545-7_6

Download references

Acknowledgements

This research was supported by the Science and Technology Research Project of Wuhu City (No. 2020yf48), the Research Foundation of the Institute of Environment-friendly Materials and Occupational Health (Wuhu), Anhui University of Science and Technology (No. ALW2021YF04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liang, X., Wang, L., Cheng, W. et al. ACKSNet: adaptive center keypoint selection for object detection. Vis Comput 39, 6073–6084 (2023). https://doi.org/10.1007/s00371-022-02712-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-022-02712-x

Keywords

Navigation