Skip to main content
Log in

Feature Enhancement for Multi-scale Object Detection

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recently, deep learning has brought great progress in object detection. However, we believe that traditional hand-crafted features may still contain valuable human knowledge complementary to features learned from raw data. Besides, almost all top-performing object detection methods extract features by using backbones originally designed for image classification. The generated features are often highly semantic, which is beneficial to global image classification, but may lose details useful for object localization and recognition under various scales. To alleviate the problems mentioned above, a feature enhancement method is proposed in this paper. Inspired by the success of histograms of oriented gradients in traditional object detection research, we construct feature channels based on oriented gradients as input to convolutional neural networks to capture discriminative local orientations. The oriented gradients and RGB features are stacked as input of network to enhance the input feature representation. For accurate object localization and recognition, we employ dilated convolutions to increase spatial resolutions of output feature maps while maintaining their respective receptive fields. Hierarchical feature maps with different receptive fields are aggregated into the final feature representation for multi-scale object detection without extra upsampling. Experimental results on PASCAL VOC 2007 and 2012 demonstrate superiority of the proposed method compared with state-of-the-art methods for multi-scale object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Kang K, Ouang W, Li H et al (2016) Object detection from video tubelets with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 817–825

  2. Satzoda RK, Trivedi MM (2014) Overtaking and receding vehicle detection for driver assistance and naturalistic driving studies. In: Proceedings of international conference on intelligent transportation systems, pp 697–702

  3. Pang S, Yu Z, Luaces O et al (2018) Deep learning and preference learning for object tracking: a combined approach. Neural Process Lett 47(3):859–876

    Article  Google Scholar 

  4. Li L J, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2036–2043

  5. Erhan D, Szegaedy C, Toshev A et al (2014) Scalable object detection using deep neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2147–2154

  6. Hao Z, Liu Y, Qin H et al (2017) Scale-aware face detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 6186–6195

  7. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 580–587

  8. Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE international conference on computer vision, pp 1440–1448

  9. Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99

  10. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 886–893

  11. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  12. Viola P, Jones M J (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 511–518

  13. Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8

  14. Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Proceedings of European conference on computer vision, pp 21–37

  15. Kong T, Yao A, Chen Y et al (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 845–853

  16. Bell S, Lawrence Z, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2874–2883

  17. Dai J, Li Y, He K et al (2016) R-FCN: object detection via region-based full convolutional networks. In: Proceedings of international conference on neural information processing systems, pp 379–384

  18. Li J, Wang T, Zhang Y (2011) Face detection using SURF cascade. In: Proceedings of IEEE international conference on computer vision, pp 2183–2190

  19. Zhu L, Chen Y, Yuille A et al (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1062–1069

  20. Deselaers T, Ferrari V (2010) Global and efficient self-similarity for object classification and detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1633–1640

  21. Zhang J, Huang K, Yu Y et al (2011) Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1393–1400

  22. Li J, Zhang Y (2013) Learning SURF cascade for fast and accurate object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3468–3475

  23. Azizpour H, Laptev I (2012) Object detection using strongly-supervised deformable part models. In: Proceedings of European conference on computer vision, pp 836–849

  24. Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545

    Article  Google Scholar 

  25. Van De Sande KEA, Uijlings JRR, Gevers T et al (2011) Segmentation as selective search for object recognition. In: Proceedings of IEEE international conference on computer vision, pp 1879–1886

  26. Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of European conference on computer vision, pp 391–405

  27. Lin T, Dollar P, Girshick R et al (2017) Feature pyramid network for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2117–2125

  28. Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 779–788

  29. Kong T, Sun F, Yao A et al (2017) RON: reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 5936–5944

  30. Shen Z, Liu Z, Li J et al (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of IEEE international conference on computer vision, pp 1919–1927

  31. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of international conference on learning representations

  32. Bodla N, Singh B, Chellappa R et al (2017) Soft-NMS-improving object detection with one line of code. In: Proceedings of IEEE international conference on computer vision, pp 5561–5569

  33. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of European conference on computer vision, pp 818–833

  34. Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1933–1941

  35. Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations, pp 1–13

  36. Everingham M, Eslami SA, Van Gool L et al (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  37. Wang L, Xiong Y, Wang, Z et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European conference on computer vision, pp 20–36

  38. Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 7263–7271

  39. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 770–778

  40. Huang G, Liu Z, Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4700–4708

  41. Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of European conference on computer vision, pp 340–353

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61172141, 61976231), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011869), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090500013), Science and Technology Program of Guangzhou (Nos. 201803030029, 2014J4100092), and Major Projects for the Innovation of Industry and Research of Guangzhou (No. 2014Y2-00213).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huicheng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, H., Chen, J., Chen, L. et al. Feature Enhancement for Multi-scale Object Detection. Neural Process Lett 51, 1907–1919 (2020). https://doi.org/10.1007/s11063-019-10182-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10182-x

Keywords

Navigation