Skip to main content
Log in

Joint deep separable convolution network and border regression reinforcement for object detection

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The improvement of object detection performance mainly depends on the extraction of local information near the target area of interest, which is also the main reason for the lack of feature semantic information. Considering the importance of scene and semantic information for visual recognition, in this paper, the improvement of the object detection algorithm is realized from three parts. Firstly, the basic residual convolution module is fused with the separable convolution module to construct a depth-wise separable convolution network (D_SCNet-127 R-CNN). Then, the feature map is sent to the scene-level region proposal self-attention network to re-identify the candidate area. This part is composed of three parallel branches: semantic segmentation module, region proposal network, and region proposal self-attention module. Finally, this paper uses deep reinforcement learning combined with a border regression network to achieve precise location of the object, and improve the calculation speed of the entire model through a light-weight head network. This model can effectively solve the limitation of feature extraction in traditional object detection and obtain more comprehensive detailed features. The experimental on MSCOCO17, Pascal VOC07, and Cityscapes datasets shows that the proposed method has good validity and scalability.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Caicedo JC, Lazebnik S (2015) Active object localization with deep reinforcement learning. In: Proceedings of the IEEE international conference on computer vision, pp 2488–2496

  2. Dai J, Li Y, He K, Sun J (2016) R-fcn: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  3. Deng Z, Li K, Zhao Q, Zhang Y, Chen H (2017) Effective face landmark localization via single deep network. arXiv:1702.02719

  4. Fan H, Ling H (2019) Siamese cascaded region proposal networks for real-time visual tracking. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7952–7961

  5. Fu J, Liu J, Tian H, Li Y, Bao Y, Fang Z, Lu H (2019) Dual attention network for scene segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3146–3154

  6. Ghiasi G, Lin T-Y, Le QV (2019) Nas-fpn: learning scalable feature pyramid architecture for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7036–7045

  7. Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448

  8. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587

  9. Guo P, Xie G, Li R (2019) Object detection using multiview cca-based graph spectral learning. J Circuits Syst Comput (4) 29:2050022

    Article  Google Scholar 

  10. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969

  11. Janai J, Güney F, Behl A, Geiger A (2017) Computer vision for autonomous vehicles: problems, datasets and state-of-the-art. arXiv:1704.05519

  12. Jiang H, Cheng MM, Li SJ, Borji A, Wang J (2019) Joint salient object detection and existence prediction. Front Comput Sci 13(4):778–788

    Article  Google Scholar 

  13. Kirillov A, Girshick R, He K, Dollár P (2019) Panoptic feature pyramid networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6399–6408

  14. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  15. Lin K, Yang H-F, Hsiao J-H, Chen C-S (2015) Deep learning of binary hash codes for fast image retrieval. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 27–35

  16. Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  17. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: European conference on computer vision, pp 21–37. Springer

  18. Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6985–6994

  19. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  20. Mathe S, Pirinen A, Sminchisescu C (2016) Reinforcement learning for visual object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2894–2902

  21. Purkait P, Zhao C, Zach C (2017) Spp-net: deep absolute pose regression with synthetic views. arXiv:1712.03452

  22. Quan Y, Li Z, Zhang F, Zhang C (2019) D\_dnet-65 r-cnn: object detection model fusing deep dilated convolutions and light-weight networks. In: Pacific rim international conference on artificial intelligence. Springer, pp 16–28

  23. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788

  24. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  25. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  26. Steder B, Rusu RB, Konolige K, Burgard W (2011) Point feature extraction on 3d range scans taking into account object boundaries. In: 2011 IEEE international conference on robotics and automation. IEEE, pp 2601–2608

  27. Wang J, Chen K, Yang S, Loy CC, Lin D (2019) Region proposal by guided anchoring. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2965–2974

  28. Wang X, Han TX, Yan S (2009) An hog-lbp human detector with partial occlusion handling. In: 2009 IEEE 12th international conference on computer vision. IEEE, pp 32–39

  29. Ye Y, Zhang C, Hao X (2019) Arpnet: attention region proposal network for 3d object detection. Sci China Inf Sci 62(12):220104

    Article  Google Scholar 

  30. Zhang H, Li D, Ji Y, Zhou H, Wu W, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 15:1–10

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61966004, 61663004, 61866004, 61762078), the Guangxi Natural Science Foundation (Nos. 2019GXNSFDA245018, 2018GXNSFDA281009, 2017GXNSFAA198365), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, the Guangxi Talent Highland Project of Big Data Intelligence and Application, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li.

Ethics declarations

Conflict of interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work. There is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Quan, Y., Li, Z., Chen, S. et al. Joint deep separable convolution network and border regression reinforcement for object detection. Neural Comput & Applic 33, 4299–4314 (2021). https://doi.org/10.1007/s00521-020-05255-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-020-05255-1

Keywords

Navigation