Skip to main content
Log in

Coordinate-based anchor-free module for object detection

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Despite the impressive performance of some recent state-of-the-art detectors, small target detection, scale variation, and label ambiguities remain challenges. To tackle these issues, we present a coordinate-based anchor-free (CBAF) module for object detection. It can be used as a branch of a single-shot detector (e.g., RetinaNet or SSD) or predict the output probabilities and coordinates directly. The main idea of the CBAF module is to predict the category and the adjustments to the box of the object by part feature and its contextual part features, which are based on feature maps divided by spatial coordinates. This is inspired by the fact that human beings can infer an entire object by observing the part of the surrounding environment. The CBAF module will encode and decode boxes in the anchor-free manner per feature map with different resolutions during training and testing. During training, we first use the proposed spatial coordinate partition layer to divide feature maps into several parts of size n × n and then propose a contextual building layer to fuse the part and its contextual parts together. We will demonstrate the CBAF module through a concrete implementation. The CBAF module improves AP scores of object detection with nearly no additional computation when working in conjunction with the anchor-based RetinaNet. Furthermore, experimental results on the MS-COCO dataset show that the mAP of the CBAF module has increased by 1.1%, compared with RetinaNet. When the CBAF module works in conjunction with the anchor-based RetinaNet, the mAP increased by 2.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bai Y, Zhang Y, Ding M, Ghanem B (2018) Finding tiny faces in the wild with generative adversarial network. In: CVPR

  2. Bell S, Lawrence Zitnick C, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883

  3. Bhagavatula C, Zhu C, Luu K, Savvides M (2017) Faster than real-time facial alignment: a 3d spatial transformer network approach in unconstrained poses. In: The IEEE international conference on computer vision (ICCV). 1

  4. Bochkovskiy A, Wang CY, Liao HYM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934. 11

  5. Cai Z, Vasconcelos N Cascade r-cnn: delving into high quality object detection. arXiv:1712.00726. 8

  6. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European conference on computer vision. Springer, pp 354–370

  7. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. arXiv:2005.12872v3 (ECCV)

  8. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  9. Deng J, Dong W, Socher R, Li L-J, Li K, FeiFei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR 2009. IEEE conference on computer vision and pattern recognition. 5. IEEE, pp 248–255

  10. Duan K, Bai S, Xie L, Qi H, Huang Q, Tian Q (2019) CenterNet: keypoint triplets for object detection. In: 2019 IEEE/CVF international conference on computer vision (ICCV)

  11. Duan K, Xie L, Qi H, Bai S, Huang Q, Tian Q (2020) Corner proposal network for anchor-free, two-stage object detection. arXiv:2007.13816v1 (ECCV)

  12. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A The PASCAL visual object classes challenge 2007 (VOC2007) results. Available: http://www.pascalnetwork.org/challenges/VOC/voc2007/workshop/index.html

  13. Fu C -Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) Dssd: deconvolutional single shot detector. arXiv:1701.06659. 2, 3, 8

  14. Gidaris S, Komodakis N (2015) Object detection via a multi-region and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142

  15. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  16. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  17. Hariharan B, Arbelaez P, Girshick R, Malik J (2015) Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2, pp 447–456

  18. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  19. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  20. Huang Y, Dai Q, Lu Y (2019) Decoupling localization and classification in single shot temporal action detection. In: IEEE international conference on multimedia and expo (ICME), p 2019

  21. Kong T, Sun F, Liu H, Jiang Y, Shi J (2020) FoveaBox: Beyound Anchor-Based Object Detection, in IEEE Transactions on Image Processing, vol. 29, pp. 7389–7398, 2020, https://doi.org/10.1109/TIP.2020.3002345.

  22. Law H, Deng J (2018) Cornernet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision, pp 734–750

  23. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J Detnet: a backbone network for object detection. arXiv:1804.06215. 2

  24. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. arXiv:1901.01892

  25. Liang X, Wang T, Yang L, Xing E Cirl: controllable imitative reinforcement learning for vision-based self-driving. arXiv:1807.03776. 1

  26. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollar P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. 1, 2, 6. Springer, Cham, pp 740–755

  27. Lin T, Zhao X, Shou Z (2017) Single shot temporal action detection. In: ACM MM. ACM

  28. Lin T-Y, Dollar P, Girshick RB, He K, Hariharan B, Belongie SJ (2017) Feature pyramid networks for object detection. In: CVPR. 2, 5, 8, p 3

  29. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. IEEE transactions on pattern analysis and machine intelligence. 1, 2, 3, 4, 5, 8

  30. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37

  31. Lu X, Li B, Yue Y, Li Q, Yan J (2019) Grid R-CNN, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, pp. 7355-7364, https://doi.org/10.1109/CVPR.2019.00754

  32. Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271

  33. Redmon J, Farhadi A YOLOv3: an incremental improvement. Computer Vision and Pattern Recognition

  34. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  35. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  36. Russakovsky O, et al. (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115:211–252

    Article  MathSciNet  Google Scholar 

  37. Schroff F, Kalenichenko D, Philbin J (2015) FaceNet: a unified embedding for face recognition and clustering. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 815–823

  38. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2014) Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR. 2

  39. Shrivastava A, Gupta A (2016) Contextual priming and feedback for faster r-cnn. In: European conference on computer vision, pp 330–348

  40. Song X, Ma L, et al. (2016) Selfishness- and Selflessness-based Models of Pedestrian Room Evacuation. Phys A-Stat Mech Appl 447(4):455–466

    Article  Google Scholar 

  41. Song X, Han D, et al. (2018) A data-driven neural network approach to simulate pedestrian movement. Phys A-Stat Mech Appl 509(11):827–844

    Article  Google Scholar 

  42. Song X, Chen K, et al. (2020) Pedestrian trajectory prediction based on deep convolutional LSTM network. IEEE Trans Intell Transp Syst 3. https://doi.org/10.1109/TITS.2020.2981118

  43. Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection. arXiv:1911.09070v7 (CVPR)

  44. Tang Z, Yang J, Pei Z, Song X, Ge B (2019) Multi-process training GAN for identity-preserving face synthesis. IEEE Access 7

  45. Tychsen-Smith L, Petersson L (2017) Denet: scalable realtime object detection with directed sparse sampling. In: Proceedings of the IEEE international conference on computer vision, pp 428–436

  46. Wang J, Yuan Y, Yu G, Jian S Sface: an efficient network for face detection in large scale variations. arXiv:1804.06559. 3

  47. Wang S, Gong Y, Xing J, Huang L, Huang C, Hu W (2019) RDSNet: a new deep architecture for reciprocal object detection and instance segmentation. arXiv:1912.05070. 13 (AAAI)

  48. Yang Z, Xu Y, Xue H, Zhang Z, Urtasun R, Wang L, Lin S, Hu H (2020) Dense RepPoints: representing visual objects with dense point sets. arXiv:1912.11473v3

  49. Yao L, Xu H, Zhang W, Liang X, Li Z (2020) SM-NAS: structural-to-modular neural architecture search for object detection. In: Proceedings of the AAAI conference on artificial intelligence (AAAI). 13

  50. Zeng X, Ouyang W, Yang B, Yan J, Wang X (2016) Gated bi-directional cnn for object detection. In: European conference on computer vision. Springer, pp 354–369

  51. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: a single-shot object detector based on multi-level feature pyramid network. In: Thirty-third AAAI conference on artificial intelligence. 2

  52. Zheng Y, Pal DK, Savvides M (2018) Ring loss: convex feature normalization for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 1, pp 5089–5097

  53. Zhong Z, Sun L, Huo Q An anchor-free region proposal network for faster r-cnn based text detection approaches. arXiv:1804.09003. 3

  54. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H et al (2017) Couplenet: coupling global structure with local parts for object detection. In: Proceedings of international conference on computer vision (ICCV). 8, vol 2

  55. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Conference on computer vision and pattern recognition (CVPR)

Download references

Acknowledgements

This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program) (No. 2016YFC0802703).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhongcai Pei.

Ethics declarations

Conflict of interest

Author Zhiyong Tang declares that he has no conflict of interest. Author Jianbing Yang declares that he has no conflict of interest. Author Zhongcai Pei declares that he has no conflict of interest. Author Xiao Song declares that he has no conflict of interest. Author Pei Pei declares that he has no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, Z., Yang, J., Pei, Z. et al. Coordinate-based anchor-free module for object detection. Appl Intell 51, 9066–9080 (2021). https://doi.org/10.1007/s10489-021-02373-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02373-8

Keywords

Navigation