Skip to main content
Log in

Global contextual attention for pure regression object detection

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Most object detection frameworks rely on rectangular bounding boxes and recognizing object instances individually. However, the bounding box provides only a coarse localization of objects and the context information between objects is not fully utilized, which result in a degradation of classification performance. In this paper, combining a lightweight contextual attention module with the representation of pure regression points, we present a novel context-based pure regression object detector. Moreover, a threshold filter mask module is designed to speed up the detector by removing a few insignificant points and keeping meaningful positions. Nonetheless, both of them do not require handcrafted clustering or post-processing steps and are easy to embed in networks. The proposed contextual attention module and threshold filter mask not only improve detection performance, but also promote training speed. We show through experiments that the proposed context-based pure regression detector can improve the representation of the regression points method about 1.5–1.8 AP on the COCO test-dev detection benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Cai ZW, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6154–6162

  2. Cao Y, Xu JR, Lin S, Wei FY, Hu H (2019) Gcnet: Non-local networks meet squeeze-excitation networks and beyond. In: IEEE international conference on computer vision (ICCV), pp 1971–1980

  3. Chen K, Wang JQ, Pang JM, Cao YH, Xiong Y, Li XX (2019) Mmdetection: Open mmlab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155

  4. Cho K, Merrienboer BV, Bahdanau D (2014) Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Empirical methods in natural language processing (EMNLP), pp 1724–1734

  5. Dai JF, Li Y, He KM, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Neural information processing systems (NIPS), pp 379–387

  6. Dai JF, Qi HZ, Xiong YW, Li Y, Zhang GD, Hu H, Wei YC (2017) Deformable convolutional networks. In: IEEE international conference on computer vision (ICCV), pp 764–773

  7. Gehring J, Auli M, Grangier D, and Dauphin YN (2017) A convolutional encoder model for neural machine translation. In: Association for Computational Linguistics (ACL), pp 123–135

  8. Girshick RB (2015) Fast R-CNN. In: IEEE international conference on computer vision (ICCV), pp 1440–1448

  9. He KM, Gkioxari G, Girshick R (2017) Mask R-CNN. In: IEEE international conference on computer vision (ICCV), pp 2980–2988

  10. He KM, Zhang XY, Ren SQ, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904–1916

    Article  Google Scholar 

  11. Hu H, Gu JY, Zhang Z, Dai JF, Wei YC (2017) Relation networks for object detection. arXiv preprint arXiv:1711.11575

  12. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7132–7141

  13. Huang ZL, Wang XG, Huang LC, Huang C, Wei YC, Liu WY (2019) Ccnet: Criss-cross attention for semantic segmentation. In: IEEE international conference on computer vision (ICCV), pp 603–612

  14. Kong T, Sun FC, Liu HP, Jiang YN, Shi JB (2019) Foveabox: Beyond anchor-based object detector. arXiv preprint arXiv:1904.03797

  15. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European conference on computer vision (ECCV), pp 765–781

  16. Li JN, Wei YC, Liang XD, Dong J, Xu TF (2017) Attentive contexts for object detection. IEEE Trans Multimedia 19(5):944–954

    Article  Google Scholar 

  17. Lin TY, Dollár P, Girshick R, He KM (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 936–944

  18. Lin TY, Goyal P, Girshick R, He KM (2020) Focal loss for dense object detection. IEEE Trans Pattern Anal Mach Intell 42(2):318–327

    Article  Google Scholar 

  19. Lin TY, Maire M, Belongie S, Hays J (2014) Microsoft COCO: common objects in context. In: European conference on computer vision (ECCV), pp 740–755

  20. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) SSD: single shot multibox detector. In: European conference on computer vision (ECCV), pp 21–37

  21. Pato L, Negrinho RM, Aguiar PM (2020) Seeing without looking: Contextual rescoring of object detections for AP maximization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 14598–14606

  22. Pinheiro PH, Collobert R, Dollár P (2015) Learning to segment object candidates. In: Neural information processing systems (NIPS), pp 1990–1998

  23. Redmon J, Divvala SK, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 779–788

  24. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767

  25. Ren SQ, He KM, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Neural information processing systems (NIPS), pp 91–99

  26. Stewart R, Andriluka M (2016) End-to-end people detection in crowded scenes. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2325–2333

  27. Tian Z, Shen CH, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: IEEE international conference on computer vision (ICCV), pp 9626–9635

  28. Toshev A, Szegedy C (2014) Deeppose: Human pose estimation via deep neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1653–1660

  29. Vaswani A, Shazeer N, Parmar N, Uszkoreit J (2017) Attention is all you need. In: Neural information processing systems (NIPS), pp 5998–6008

  30. Wang XL, Girshick R, Gupta A, He KM (2018) Non-local neural networks. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 7794–7803

  31. Xu H, Jiang CH, Liang XD, Lin L, Li ZG (2019) Reasoning-RCNN: Unifying adaptive global reasoning into large-scale object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 6419–6428

  32. Yang Z, Liu SH, Hu H, Wang LW, Lin S (2019) Reppoints: Point set representation for object detection. In: IEEE international conference on computer vision (ICCV), pp 9656–9665

  33. Zhou XY, Wang DQ, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850

  34. Zhou XY, Zhuo JC, Krähenbühl P (2019) Bottom-up object detection by grouping extreme and center points. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 850–859

  35. Zhu CC, He YH, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 840–849

  36. Ke W, Zhang TL, Huang ZY, Ye QX, Liu ZJ, Huang D (2020) Multiple anchor learning for visual object detection In: IEEE conference on computer vision and pattern recognition (CVPR), pp 10203–10212

  37. Shao MW, Zhang GZ, Zuo WM, Meng DY (2021) Target attack on biomedical image segmentation model based on multi-scale gradients. Inf Sci 554:33–46

    Article  MathSciNet  Google Scholar 

  38. Li YH, Shao MW, Fan BB, Zhang W (2021) Multi-scale global context feature pyramid network for object detector. Signal Image Video Pro 1-9

  39. Yang Y, Zhuang YT, Pan YH (2021) Multiple knowledge representation for big data artificial intelligence: framework, applications, and case studies. Front Inf Technol Electr Eng 22(12):1551–1684

    Article  Google Scholar 

Download references

Acknowledgements

The authors are very indebted to the anonymous referees for their critical comments and suggestions for the improvement of this paper. This work was supported by National Key Research and development Program of China (2021YFA1000102), and in part by the grants from the National Natural Science Foundation of China (Nos. 61673396, 61976245).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mingwen Shao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, B., Shao, M., Li, Y. et al. Global contextual attention for pure regression object detection. Int. J. Mach. Learn. & Cyber. 13, 2189–2197 (2022). https://doi.org/10.1007/s13042-022-01514-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-022-01514-w

Keywords

Navigation