Skip to main content
Log in

Enhance the recognition ability to occlusions and small objects with Robust Faster R-CNN

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Recognizing objects with vastly different size scales and objects with occlusions is a fundamental challenge in computer vision. This paper addresses this issue by proposing a novel approach denoted as Robust Faster R-CNN for detecting objects in multi-label images. Robust Faster R-CNN employs a cascaded network structure based on the Faster R-CNN architecture to extract features from objects with different size scales. However, the proposed design provides greater robustness than Faster R-CNN by replacing the RoIPooling operation with RoIAligns to eliminate the harsh quantization conducted by RoIPooling, and we design a multi-scale RoIAligns operation by adding multiple pool sizes for adapting the detection ability of the network to objects with different sizes. Furthermore, we combine an adversarial network with the proposed network to generate training samples with occlusions significantly affecting the classification ability of the model, which improves its robustness to occlusions. Experimental results for the PASCAL VOC 2012 and 2007 datasets demonstrate the superiority of the proposed object detection approach relative to several state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Chen Y, Wang Z, Peng Y, Zhang Z, Yu G, Sun J (2018) Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7103–7112

  2. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387

  3. Everingham M, Williams C (2010) The pascal visual object classes challenge 2010 (voc2010). In: International conference on machine learning, pp 117–176

  4. Girshick R (2015) Fast r-cnn. In: Advances in neural information processing systems, pp 91–99

  5. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 580–587

  6. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37(9):1904

    Article  Google Scholar 

  7. He K, Gkioxari G, Dollar P, Girshick R (2017) Mask r-cnn. IEEE Trans Pattern Anal Mach Intell 99:1–1

    Google Scholar 

  8. Huang G, Liu Z, Laurens VDM, Weinberger KQ (2016) Densely connected convolutional networks. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 2261–2269

  9. Jiang Y, Zhu X, Wang X, Yang S, Li W, Wang H, Fu P, Luo Z (2017) R2cnn: Rotational region cnn for orientation robust scene text detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 2261–2269

  10. Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: Towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853

  11. Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) Ron: Reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, vol 1

  12. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  13. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125

  14. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2015) Ssd: Single shot multibox detector. In: European conference on computer vision, pp 21–37

  15. Oquab M, Bottou L, Laptev I, Sivic J (2014) Learning and transferring mid-level image representations using convolutional neural networks. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 1717–1724

  16. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Computer vision and pattern recognition

  17. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99

  18. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M et al (2015) Imagenet large scale visual recognition challenge. Int J Comput Vision 115(3):211–252

    Article  MathSciNet  Google Scholar 

  19. Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, Lecun Y (2013) Overfeat: Integrated recognition, localization and detection using convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition

  20. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556

  21. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the thirty-first AAAI conference on artificial intelligence

  22. Tao Z, Li Z, Zhang C, Lan L (2018) An improved convolutional neural network model with adversarial net for multi-label image classification. In: Pacific Rim international conference on artificial intelligence

  23. Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vision 104(2):154–171

    Article  Google Scholar 

  24. Wang X, Shrivastava A, Gupta A (2017) A-fast-rcnn: Hard positive generation via adversary for object detection. In: Proceedings of IEEE international conference on computer vision and pattern recognition, pp 21–26

  25. Wei S, Li Z, Zhang C (2018) Combined constraint-based with metric-based in semi-supervised clustering ensemble. Int J Mach Learn Cybernet 9(7):1085–1100

    Article  Google Scholar 

  26. Wei Y, Xia W, Lin M, Huang J, Ni B, Dong J, Zhao Y, Yan S (2016) Hcp: A flexible cnn framework for multi-label image classification. IEEE Trans Pattern Anal Mach Intell 38(9):1901–1907

    Article  Google Scholar 

  27. Zheng Y, Li Z, Zhang C (2018) A hybrid architecture based on cnn for cross-modal semantic instance annotation. Multimedia Tools and Applications 77(7):8695–8710

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Nos. 61966004, 61663004, 61762078, 61866004), the Guangxi Natural Science Foundation (Nos. 2016GXNSFAA380146, 2017GXNSFAA198365, 2018GXNSFDA281009), the Research Fund of Guangxi Key Lab of Multi-source Information Mining and Security (16-A-03-02, MIMS18-08), the Guangxi Special Project of Science and Technology Base and Talents (AD16380008), the Guangxi “Bagui Scholar” Teams for Innovation and Research Project, Guangxi Collaborative Innovation Center of Multi-source Information Integration and Intelligent Processing.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhixin Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhou, T., Li, Z. & Zhang, C. Enhance the recognition ability to occlusions and small objects with Robust Faster R-CNN. Int. J. Mach. Learn. & Cyber. 10, 3155–3166 (2019). https://doi.org/10.1007/s13042-019-01006-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-019-01006-4

Keywords

Navigation