Skip to main content
Log in

EFLDet: enhanced feature learning for object detection

  • Review
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

In this paper, we propose an enhanced feature learning method for object detection regarding with backbone, neck and head, which are three main components of object detection. For the backbone network, we build a bi-residual network to extract salient information by extending residual block with aggregate connection for the global feature representation. For the neck network, we design an enhanced feature pyramid network to fuse spatial and channel-wise information within different receptive fields of feature maps, which introduces the attention module with the global context block and the dilated convolution module to reduce the decay of information in the feature fusion. For the detection head, we construct a trident head network to improve the confidence of classification and regression, which consists of a fully connected head, a convolution head and an attention head. The experiments conducted on COCO dataset show that the proposed approaches are widely applicable and can verity the effectiveness for object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikä M (2018) Deep learning for generic object detection: a survey. Int J Comput Vis 128:261–318

    Article  Google Scholar 

  2. Zou Z, Shi Z, Guo Y, Ye J (2019) Object detection in 20 years: a survey. IEEE Trans Pattern Anal Mach Intell

  3. Zaidi SSA, Ansari MS, Aslam A, Kanwal N, Asghar M, Lee B (2021) A survey of modern deep learning based object detection models. IET Comput Vis

  4. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497

  5. Cai Z, Vasconcelos N (2018) Cascade r-cnn: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162

  6. Singh B, Davis LS (2018) An analysis of scale invariance in object detection-snip. IEEE Int Conf Comput Vis

  7. Tian Z, Shen C, Chen H, He T (2019) Fcos: fully convolutional one-stage object detection. IEEE Conf Comput Vis Pattern Recogn

  8. Tian Z, Shen C, Chen H, He T (2020) Fcos: a simple and strong anchor-free object detector. IEEE Trans Pattern Anal Mach Intell

  9. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: Proceedings of the European conference on computer vision (ECCV)

  10. Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  11. Ze Y, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: point set representation for object detection. In: IEEE international conference on computer vision

  12. Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: IEEE conference on computer vision and pattern recognition

  13. Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q (2020) Feature pyramid transformer. In: Proceedings of the European conference on computer vision (ECCV)

  14. Sun P, Zhang R, Yi J, Kong T, Xu C, Zhan W, Tomizuka M, Li L, Yuan Z, Wang C, Luo P (2021) Sparse r-cnn: end-to-end object detection with learnable proposals. In: IEEE conference on computer vision and pattern recognition

  15. Sun P, Jiang Y, Xie E, Yuan Z, Wang C (2020) and Ping Luo. Towards end-to-end one-stage object detection. arix, Onenet

  16. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: International conference on machine learning

  17. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  18. Tan M, Le Q (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 6105–6114. PMLR

  19. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition

  20. Liu S, Di H, Wang Y (2019) Learning spatial fusion for single-shot object detection. arix

  21. Wu Y, Chen Y, Lu Y, Liu Z, Wang L, Li H, Fu Y (2020) Rethinking classification and localization for object detection. In: IEEE conference on computer vision and pattern recognition

  22. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500

  23. Gao S-H, Cheng M-M, Zhao K, Zhang X-Y, Yang M-H, Torr Philip (2021) Res2net: a new multi-scale backbone architecture. IEEE Trans Pattern Anal Mach Intell

  24. Zhang H, Wu C, Zhang Z, Yi Z, Zhang Z, Lin H, Sun Y, He T, Mueller J, Manmatha R et al (2020) Resnest: split-attention networks. arXiv preprint arXiv:2004.08955

  25. Wu X, Zhang D, Zhu J, Hoi SCH (2018) Single-shot bidirectional pyramid networks for high-quality object detection. Assoc Adv Artif Intell 401:1–9

    Google Scholar 

  26. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: IEEE conference on computer vision and pattern recognition

  27. Guo C, Fan B, Zhang Q, Xiang S, Pan C (2020) Augfpn: improving multi-scale feature learning for object detection. In: IEEE conference on computer vision and pattern recognition

  28. Wang T, Zhang X, Sun J (2021) Implicit feature pyramid network for object detection. In: IEEE conference on computer vision and pattern recognition

  29. Zhang H, Chang H, Ma B, Shan S, Chen X (2019) Cascade retinanet: maintaining consistency for single-stage object detection. In: British machine vision conference

  30. He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: European conference on computer vision. Springer, pp 630–645

  31. Dai J, Qi H, Xiong Y, Yi L, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 764–773

  32. Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122

  33. Howard AG, Zhu M, Bo C, Kalenichenko D (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: IEEE conference on computer vision and pattern recognition

  34. Sandler M, Howard A, Zhu M (2018) Mobilenetv2: inverted residuals and linear bottlenecks. In: IEEE conference on computer vision and pattern recognition

  35. Howard A, Sandler M, Ch G (2019)Searching for mobilenetv3. In: IEEE international conference on computer vision

  36. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: an extremely efficient convolutional neural network for mobile devices. In: IEEE conference on computer vision and pattern recognition

  37. Ma N, Zhang X, Zheng H-T (2019) Shufflenet v2: practical guidelines for efficient cnn architecture design. IEEE international conference on computer vision

  38. Zagoruyko S, Komodakis N (2016) Wide residual networks. In: British machine vision conference

  39. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  40. Chen Y, Li J, Xiao H, Jin X, Yan S, Feng J (2017) Dual path networks. arXiv preprint arXiv:1707.01629

  41. Hu J, Li S, Sun G (2017) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  42. Li X, Wang W, Hu X, Yang J (2019) Selective kernel networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519

  43. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European conference on computer vision (ECCV), pp 3–19

  44. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803

  45. Cao Y, Xu J, Lin S, Wei F, Hu H (2019) Gcnet: non-local networks meet squeeze-excitation networks and beyond. In: Proceedings of the IEEE/CVF international conference on computer vision workshops

  46. Qilong W, Wu B, Pengfei Z, Li P, Wangmeng Z, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

  47. Misra D, Nalamada T et al (2020) Rotate to attend: convolutional triplet attention module. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 510–519

  48. Liu S, Di H, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: Proceedings of the European conference on computer vision (ECCV)

  49. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: IEEE international conference on computer vision

  50. Du X, Lin T-Y, Jin P, Ghiasi G, Tan M, Cui Y, Le QV, Song X (2020) Spinenet: learning scale-permuted backbone for recognition and localization. In: IEEE conference on computer vision and pattern recognition 2020

  51. Liu Y, Wang Y, Wang S, Liang T, Zhao Q, Tang Z, Ling H (2020) Cbnet: a novel composite backbone network architecture for object detection. In: IEEE conference on computer vision and pattern recognition

  52. Li Z, Peng C, Yu G, Zhang X, Deng Y, Sun J (2018) Detnet: a backbone network for object detection. In: Proceedings of the European conference on computer vision (ECCV)

  53. Luo Y, Cao X, Zhang J, Cao X, Guo J, Shen H, Wang T (2020) and Qi Feng. Enhancing channel information for object detection. Arix, Ce-fpn

  54. Kong T, Sun F, Liu H, Jiang Y, Shi J (2019) Consistent optimization for single-shot object detection. In: IEEE conference on computer vision and pattern recognition

  55. Li Z, Peng C, Gang Y, Zhang X, Deng Y (2017) and Jian Sun. In defense of two-stage object detector. arix, Light-head r-cnn

  56. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  57. Jiang B, Luo R, Mao J, Xiao T, Jiang Y (2018) Acquisition of localization confidence for accurate object detection. In: Proceedings of the European conference on computer vision (ECCV)

  58. Huang Z, Huang L, Gong Y, Huang C, Wang X (2019) Path aggregation network for instance segmentation. In: IEEE conference on computer vision and pattern recognition

  59. Lin M, Chen Q, Yan S (2013) Network in network. arXiv preprint arXiv:1312.4400

Download references

Acknowledgements

This work is supported by the Guangdong Basic and Applied Basic Research Foundation (No.2020A1515010616), Science and Technology Program of Guangzhou (No.202102020524, No.202007040005), the Guangdong Innovative Research Team Program (No.2014ZT05G157), the Key-Area Research and Development Program of Guangdong Province (2019B010136001), and the Science and Technology Planning Project of Guangdong Province (LZC0023).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhenguo Yang or Wenyin Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liao, Y., Zhang, G., Yang, Z. et al. EFLDet: enhanced feature learning for object detection. Neural Comput & Applic 34, 1033–1045 (2022). https://doi.org/10.1007/s00521-021-06607-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06607-1

Keywords

Navigation