Skip to main content
Log in

Multi-branch Bounding Box Regression for Object Detection

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Localization and classification are two important components in the task of visual object detection. In recent years, object detectors have increasingly focused on creating various localization branches. Bounding box regression is vital for two-stage detectors. Therefore, we propose a multi-branch bounding box regression method called Multi-Branch R-CNN for robust object localization. Multi-Branch R-CNN is composed of the fully connected head and the fully convolutional head. The fully convolutional head focuses on the utilization of spatial semantics. It is complementary to the fully connected head that prefers local features. The features extracted from the two localization branches are fused, then flow to the next stage for classification and regression. The two branches cooperate to predict more precise localization, which significantly improves the performance of the detector. Extensive experiments were conducted on public PASCAL VOC and MS COCO benchmarks. On the COCO dataset, our Multi-Branch R-CNN with ResNet-101 backbone achieved state-of-the-art single model results by obtaining an mAP of 43.2. Extensive comparative experiments prove the effectiveness of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ren S, He K, Girshick R, Sun J. Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems. 2015. p. 91–99.

  2. Lu X, Li B, Yue Y, Li Q, Yan J. Grid R-CNN. In: Proceedings of the IEEE Conference on CVPR. 2019. p. 7363–7372.

  3. Wu Y, Chen Y, Yuan L, Liu Z, Wang L, et al. Double-head RCNN: rethinking classification and localization for object detection. arXiv. 2019;1904:06493.

  4. Cai Z, Vasconcelos N. Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE Conference on CVPR. 2018. p. 6154–6162.

  5. Vasamsetti S, Mittal N, Neelapu BC, et al. 3D local spatio-temporal ternary patterns for moving object detection in complex scenes. Cogn Comput. 2019;11:18–30.

    Article  Google Scholar 

  6. Kim J, Oh K, Oh B, et al. A line feature extraction method for finger-knuckle-print verification. Cogn Comput. 2019;11:50–70.

    Article  Google Scholar 

  7. Gao F, Huang T, Sun J, et al. A new algorithm for SAR image target recognition based on an improved deep convolutional neural network. Cogn Comput. 2019;11:809–24.

    Article  Google Scholar 

  8. Lin T-Y, Doll´ar P, Girshick R, He K, Hariharan B, et al. Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on CVPR. 2017. p. 2117–2125.

  9. Xu H, et al. Auto-fpn: Automatic network architecture adaptation for object detection beyond classification. In: Proceedings of the IEEE International Conference on Computer Vision. 2019.

  10. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, et al. SSD: single shot multibox detector. In: European Conference on Computer Vision. Springer; 2016. p. 21–37.

  11. Redmon J, Farhadi A. YOLO9000: better, faster, stronger. In: Proceedings of the IEEE Conference on CVPR. 2017. p. 7263–7271.

  12. Redmon J, Farhadi A. Yolov3: An incremental improvement. arXiv. 2018;1804:02767.

  13. Lin T-Y, Goyal P, Girshick R, He K, Doll´ar P. Focal loss for dense object detection. In: Proceedings of the IEEE ICCV. 2017. p. 2980–2988.

  14. He K, Zhang X, Ren S, Sun J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell. 2015;37(9):1904–16.

    Article  Google Scholar 

  15. Girshick R. Fast R-CNN. In: Proceedings of the IEEE ICCV. 2015. p. 1440–1448.

  16. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on CVPR. 2016. p. 770–778.

  17. Deng J, Dong W, Socher R, Li L-J, Li K, et al. Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on CVPR. IEEE; 2009. p. 248–255.

  18. He K, Gkioxari G, Doll´ar P, Girshick R. Mask R-CNN. In: Proceedings of the IEEE ICCV. 2017. p. 2961–2969.

  19. Gidaris S, Komodakis N. LocNet: Improve in localization accuracy for object detection. In: Proceedings of the IEEE ICCV. 2016. p. 789–798.

  20. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC. DSSD: Deconvolutional single shot detector. arXiv. 2017;1701:06659.

    Google Scholar 

  21. Bochkovskiy A, Wang CY, Liao H. YOLOv4: optimal speed and accuracy of object detection. arXiv. 2020;2004:10934.

  22. Tychsen-Smith L, Petersson L. DeNet: scalable real-time object detection with directed sparse sampling. In: Proceedings of the IEEE ICCV. 2017. p. 428–436.

  23. Zhang S, Wen L, Bian X, Lei Z, Li SZ. Single-shot refinement neural network for object detection. In: Proceedings of the IEEE Conference on CVPR. 2018. p. 4203–4212.

  24. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, et al. CoupleNet: Coupling global structure with local parts for object detection. In: Proceedings of the IEEE ICCV. 2017. p. 4126–4134.

  25. Gao Z, Wang L, Wu G. Lip: Local importance-based pooling. In: Proceedings of the IEEE International Conference on Computer Vision. 2019.

  26. Li Y, et al. Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision. 2019.

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their valuable comments and suggestions, which are very helpful in improving this paper.

Funding

This work was supported in part by NSFC Key Project of International (Regional) Cooperation and Exchanges (No.61860206004), National Natural Science Foundation of China (NO.61976004), and Collegiate Natural Science Fund of Anhui Province (NO.KJ2017A014).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Si-Bao Chen.

Ethics declarations

Conflict of Interest

The authors declare no competing interests.

Ethics Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuan, HS., Chen, SB., Luo, B. et al. Multi-branch Bounding Box Regression for Object Detection. Cogn Comput 15, 1300–1307 (2023). https://doi.org/10.1007/s12559-021-09983-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-021-09983-x

Keywords

Navigation