ACKSNet: adaptive center keypoint selection for object detection

Liang, Xingzhu; Wang, Lixin; Cheng, Wei; Yan, Xinyun; Chen, Qing

doi:10.1007/s00371-022-02712-x

ACKSNet: adaptive center keypoint selection for object detection

Original article
Published: 02 November 2022

Volume 39, pages 6073–6084, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Xingzhu Liang^1,2,
Lixin Wang¹,
Wei Cheng¹,
Xinyun Yan³ &
…
Qing Chen⁴

294 Accesses
1 Altmetric
Explore all metrics

Abstract

Keypoint-based detectors generate a large number of false positives due to incorrect keypoint matching in the object detection task. In this paper, we propose an adaptive center keypoint selection method (ACKSNet) to address the false-positive drawback. We first roughly group detected corners by associative embeddings, which flexibly localize objects of various shapes and scales to obtain a large number of initial candidate proposals. Then, ACKSNet associates introduced extra center keypoints with corner pairs through the geometric method to add location information in the candidate regions. And it independently generates a threshold for each center keypoint according to their statistical characteristics to ensure the high quality of center keypoints. Furthermore, we enrich the scale information of the output feature maps by equipping the backbone network with dilated convolution modules. On the MS COCO dataset, our model achieves an AP of 44.5%, surpassing most existing anchor-free detectors.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

C-FCN: Corners-based fully convolutional network for visual object detection

Article 06 August 2020

Anchor-free object detection with mask attention

Article Open access 14 July 2020

Boundary Information Aggregation and Adaptive Keypoint Combination Enhanced Object Detection

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets used or analyzed during the current study are available from the corresponding author on reasonable request.

References

Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2Det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020). https://doi.org/10.1109/CVPR42600.2020.01150
Fan, Q., Zhuo, W., Tang, C.-K., Tai, Y.-W.: Few-shot object detection with attention-RPN and multi-relation detector. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4013–4022 (2020). https://doi.org/10.1109/CVPR42600.2020.00407
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063 (2019). https://doi.org/10.1109/ICCV.2019.00615
Bochkovskiy, A., Wang, C.-Y., Liao, H.-Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020). https://doi.org/10.48550/arXiv.2004.10934
Zhang, H., Hu, Z., Hao, R.: Joint information fusion and multi-scale network model for pedestrian detection. Vis. Comput. 37(8), 2433–2442 (2021). https://doi.org/10.1007/s00371-020-01997-0
Article Google Scholar
Chen, J., Wu, Q., Liu, D., Xu, T.: Foreground-background imbalance problem in deep object detectors: a review. In: 2020 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 285–290 (2020). IEEE. https://doi.org/10.1109/MIPR49039.2020.00066
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. Int. J. Comput. Vis. 128(3), 642–656 (2020). https://doi.org/10.1007/s11263-019-01204-1
Article Google Scholar
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet: keypoint triplets for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6569–6578 (2019). https://doi.org/10.1109/ICCV.2019.00667
Zhou, X., Zhuo, J., Krahenbuhl, P.: Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 850–859 (2019). https://doi.org/10.1109/CVPR.2019.00094
Saeidi, M., Arabsorkhi, A.: A novel backbone architecture for pedestrian detection based on the human visual system. Vis. Comput. 38(6), 2223–2237 (2022). https://doi.org/10.1007/s00371-021-02280-6
Article Google Scholar
Dong, Z., Li, G., Liao, Y., Wang, F., Ren, P., Qian, C.: CentripetalNet: pursuing high-quality keypoint pairs for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10519–10528 (2020). https://doi.org/10.1109/CVPR42600.2020.01053
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 (2016). https://doi.org/10.48550/arXiv.1611.05424
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499 (2016). Springer. https://doi.org/10.1007/978-3-319-46484-8_29
Lin, T.-Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017). https://doi.org/10.1109/CVPR.2017.106
Zhang, S., Chi, C., Yao, Y., Lei, Z., Li, S.Z.: Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768 (2020). https://doi.org/10.1109/CVPR42600.2020.00978
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016). https://doi.org/10.1109/CVPR.2016.308
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017). https://doi.org/10.1109/CVPR.2017.243
Ben Fredj, H., Bouguezzi, S., Souani, C.: Face recognition in unconstrained environment with CNN. Vis. Comput. 37(2), 217–226 (2021). https://doi.org/10.1007/s00371-020-01794-9
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). https://doi.org/10.1109/TPAMI.2016.2577031
Article Google Scholar
Qin, Z., Li, Z., Zhang, Z., Bao, Y., Yu, G., Peng, Y., Sun, J.: ThunderNet: towards real-time generic object detection on mobile devices. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6718–6727 (2019). https://doi.org/10.1109/ICCV.2019.00682
Xie, X., Cheng, G., Wang, J., Yao, X., Han, J.: Oriented R-CNN for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3520–3529 (2021). https://doi.org/10.1109/ICCV48922.2021.00350
Roy, K., Sahay, R.R.: A robust multi-scale deep learning approach for unconstrained hand detection aided by skin segmentation. Vis. Comput. (2021). https://doi.org/10.1007/s00371-021-02157-8
Article Google Scholar
Li, H., Wu, Z., Zhu, C., Xiong, C., Socher, R., Davis, L.S.: Learning from noisy anchors for one-stage object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10588–10597 (2020). https://doi.org/10.1109/CVPR42600.2020.01060
Wei, L., Cui, W., Hu, Z., Sun, H., Hou, S.: A single-shot multi-level feature reused neural network for object detection. Vis. Comput. 37(1), 133–142 (2021). https://doi.org/10.1007/s00371-019-01787-3
Article Google Scholar
Huang, X., Wang, X., Lv, W., Bai, X., Long, X., Deng, K., Dang, Q., Han, S., Liu, Q., Hu, X., et al.: PP-YOLOV2: a practical object detector. arXiv preprint arXiv:2104.10419 (2021). https://doi.org/10.48550/arXiv.2104.10419
Zhang, T., Li, Z., Sun, Z., Zhu, L.: A fully convolutional anchor-free object detector. Vis. Comput. (2022). https://doi.org/10.1007/s00371-021-02357-2
Article Google Scholar
Uijlings, J.R., Van De Sande, K.E., Gevers, T., Smeulders, A.W.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013). https://doi.org/10.1007/s11263-013-0620-5
Article Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017). https://doi.org/10.1109/TPAMI.2018.2844175
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.324
Liu, D., Cui, Y., Chen, Y., Zhang, J., Fan, B.: Video object detection for autonomous driving: motion-aid feature calibration. Neurocomputing 409, 1–11 (2020). https://doi.org/10.1016/j.neucom.2020.05.027
Article Google Scholar
Xiao, J., Xu, J., Tian, C., Han, P., You, L., Zhang, S.: A serial attention frame for multi-label waste bottle classification. Appl. Sci. 12(3), 1742 (2022). https://doi.org/10.3390/app12031742
Article Google Scholar
Huang, L., Yang, Y., Deng, Y., Yu, Y.: DenseBox: unifying landmark localization with end to end object detection. arXiv preprint arXiv:1509.04874 (2015). https://doi.org/10.48550/arXiv.1509.04874
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016). https://doi.org/10.1109/CVPR.2016.91
Tian, Z., Shen, C., Chen, H., He, T.: FCOS: a simple and strong anchor-free object detector. IEEE Trans. Pattern Anal. Mach. Intell. 5(99), 1 (2020). https://doi.org/10.1109/TPAMI.2020.3032166
Article Google Scholar
Liu, D., Cui, Y., Tan, W., Chen, Y.: SG-Net: spatial granularity network for one-stage video instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9816–9825 (2021). https://doi.org/10.1109/CVPR46437.2021.00969
Cui, Y., Yan, L., Cao, Z., Liu, D.: TF-Blender: temporal feature blender for video object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8138–8147 (2021). https://doi.org/10.1109/ICCV48922.2021.00803
Duan, K., Bai, S., Xie, L., Qi, H., Huang, Q., Tian, Q.: CenterNet++ for object detection. arXiv preprint arXiv:2204.08394 (2022). https://doi.org/10.48550/arXiv.2204.08394
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: European Conference on Computer Vision, pp. 213–229 (2020). Springer. https://doi.org/10.1007/978-3-030-58452-8_13
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: arXiv (2017). https://doi.org/10.48550/arXiv.1706.03762
Huang, K., Tian, C., Su, J., Lin, J.C.-W.: Transformer-based cross reference network for video salient object detection. Pattern Recognit. Lett. 160, 122–127 (2022). https://doi.org/10.1016/j.patrec.2022.06.006
Article Google Scholar
Zhang, H., Li, F., Liu, S., Zhang, L., Su, H., Zhu, J., Ni, L.M., Shum, H.-Y.: DINO: DETR with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605 (2022). https://doi.org/10.48550/arXiv.2203.03605
Liu, S., Li, F., Zhang, H., Yang, X., Qi, X., Su, H., Zhu, J., Zhang, L.: DAB-DETR: dynamic anchor boxes are better queries for DETR. arXiv preprint arXiv:2201.12329 (2022). https://doi.org/10.48550/arXiv.2201.12329
Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: DN-DETR: accelerate DETR training by introducing query denoising. arXiv preprint arXiv:2203.01305 (2022). https://doi.org/10.48550/arXiv.2203.01305
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1109/CVPR.2016.90
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015). https://doi.org/10.1109/ICCV.2015.169
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch (2017)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). https://doi.org/10.48550/arXiv.1412.6980
Bodla, N., Singh, B., Chellappa, R., Davis, L.S.: Soft-NMS—improving object detection with one line of code. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5561–5569 (2017). https://doi.org/10.1109/ICCV.2017.593
Shrivastava, A., Sukthankar, R., Malik, J., Gupta, A.: Beyond skip connections: top-down modulation for object detection. arXiv preprint arXiv:1612.06851 (2016). https://doi.org/10.48550/arXiv.1612.06851
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 821–830 (2019). https://doi.org/10.1109/CVPR.2019.00091
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4203–4212 (2018). https://doi.org/10.1109/CVPR.2018.00442
Cai, Z., Vasconcelos, N.: Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162 (2018). https://doi.org/10.1109/CVPR.2018.00644
Lu, X., Li, B., Yue, Y., Li, Q., Yan, J.: Grid R-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7363–7372 (2019). https://doi.org/10.1109/CVPR.2019.00754
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018). https://doi.org/10.48550/arXiv.1804.02767
Zhu, C., Chen, F., Shen, Z., Savvides, M.: Soft anchor-point object detection. In: European Conference on Computer Vision, pp. 91–107 (2020). Springer. https://doi.org/10.1007/978-3-030-58545-7_6

Download references

Acknowledgements

This research was supported by the Science and Technology Research Project of Wuhu City (No. 2020yf48), the Research Foundation of the Institute of Environment-friendly Materials and Occupational Health (Wuhu), Anhui University of Science and Technology (No. ALW2021YF04).

Author information

Authors and Affiliations

School of Computer Science and Engineering, Anhui University of Science and Technology, Huainan, 232001, China
Xingzhu Liang, Lixin Wang & Wei Cheng
Institute of Environment-friendly Materials and Occupational Health, Anhui University of Science and Technology, Wuhu, 241003, China
Xingzhu Liang
Jinling Institute of Technology, Jiangsu AI Transportation Innovations & Applications Engineering Research Center, Nanjing, 211169, China
Xinyun Yan
Nanjing Highway Development (Group) Co., Ltd., Nanjing, 210000, China
Qing Chen

Authors

Xingzhu Liang
View author publications
You can also search for this author inPubMed Google Scholar
Lixin Wang
View author publications
You can also search for this author inPubMed Google Scholar
Wei Cheng
View author publications
You can also search for this author inPubMed Google Scholar
Xinyun Yan
View author publications
You can also search for this author inPubMed Google Scholar
Qing Chen
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Lixin Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, X., Wang, L., Cheng, W. et al. ACKSNet: adaptive center keypoint selection for object detection. Vis Comput 39, 6073–6084 (2023). https://doi.org/10.1007/s00371-022-02712-x

Download citation

Accepted: 20 October 2022
Published: 02 November 2022
Issue Date: December 2023
DOI: https://doi.org/10.1007/s00371-022-02712-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ACKSNet: adaptive center keypoint selection for object detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

C-FCN: Corners-based fully convolutional network for visual object detection

Anchor-free object detection with mask attention

Boundary Information Aggregation and Adaptive Keypoint Combination Enhanced Object Detection

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now