Abstract
Recently, deep learning has brought great progress in object detection. However, we believe that traditional hand-crafted features may still contain valuable human knowledge complementary to features learned from raw data. Besides, almost all top-performing object detection methods extract features by using backbones originally designed for image classification. The generated features are often highly semantic, which is beneficial to global image classification, but may lose details useful for object localization and recognition under various scales. To alleviate the problems mentioned above, a feature enhancement method is proposed in this paper. Inspired by the success of histograms of oriented gradients in traditional object detection research, we construct feature channels based on oriented gradients as input to convolutional neural networks to capture discriminative local orientations. The oriented gradients and RGB features are stacked as input of network to enhance the input feature representation. For accurate object localization and recognition, we employ dilated convolutions to increase spatial resolutions of output feature maps while maintaining their respective receptive fields. Hierarchical feature maps with different receptive fields are aggregated into the final feature representation for multi-scale object detection without extra upsampling. Experimental results on PASCAL VOC 2007 and 2012 demonstrate superiority of the proposed method compared with state-of-the-art methods for multi-scale object detection.
Similar content being viewed by others
References
Kang K, Ouang W, Li H et al (2016) Object detection from video tubelets with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 817–825
Satzoda RK, Trivedi MM (2014) Overtaking and receding vehicle detection for driver assistance and naturalistic driving studies. In: Proceedings of international conference on intelligent transportation systems, pp 697–702
Pang S, Yu Z, Luaces O et al (2018) Deep learning and preference learning for object tracking: a combined approach. Neural Process Lett 47(3):859–876
Li L J, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2036–2043
Erhan D, Szegaedy C, Toshev A et al (2014) Scalable object detection using deep neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2147–2154
Hao Z, Liu Y, Qin H et al (2017) Scale-aware face detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 6186–6195
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Viola P, Jones M J (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 511–518
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Proceedings of European conference on computer vision, pp 21–37
Kong T, Yao A, Chen Y et al (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 845–853
Bell S, Lawrence Z, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2874–2883
Dai J, Li Y, He K et al (2016) R-FCN: object detection via region-based full convolutional networks. In: Proceedings of international conference on neural information processing systems, pp 379–384
Li J, Wang T, Zhang Y (2011) Face detection using SURF cascade. In: Proceedings of IEEE international conference on computer vision, pp 2183–2190
Zhu L, Chen Y, Yuille A et al (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1062–1069
Deselaers T, Ferrari V (2010) Global and efficient self-similarity for object classification and detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1633–1640
Zhang J, Huang K, Yu Y et al (2011) Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1393–1400
Li J, Zhang Y (2013) Learning SURF cascade for fast and accurate object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3468–3475
Azizpour H, Laptev I (2012) Object detection using strongly-supervised deformable part models. In: Proceedings of European conference on computer vision, pp 836–849
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Van De Sande KEA, Uijlings JRR, Gevers T et al (2011) Segmentation as selective search for object recognition. In: Proceedings of IEEE international conference on computer vision, pp 1879–1886
Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of European conference on computer vision, pp 391–405
Lin T, Dollar P, Girshick R et al (2017) Feature pyramid network for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2117–2125
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 779–788
Kong T, Sun F, Yao A et al (2017) RON: reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 5936–5944
Shen Z, Liu Z, Li J et al (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of IEEE international conference on computer vision, pp 1919–1927
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of international conference on learning representations
Bodla N, Singh B, Chellappa R et al (2017) Soft-NMS-improving object detection with one line of code. In: Proceedings of IEEE international conference on computer vision, pp 5561–5569
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of European conference on computer vision, pp 818–833
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1933–1941
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations, pp 1–13
Everingham M, Eslami SA, Van Gool L et al (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Wang L, Xiong Y, Wang, Z et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European conference on computer vision, pp 20–36
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 7263–7271
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of European conference on computer vision, pp 340–353
Acknowledgements
This work was supported by National Natural Science Foundation of China (Nos. 61172141, 61976231), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011869), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090500013), Science and Technology Program of Guangzhou (Nos. 201803030029, 2014J4100092), and Major Projects for the Innovation of Industry and Research of Guangzhou (No. 2014Y2-00213).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zheng, H., Chen, J., Chen, L. et al. Feature Enhancement for Multi-scale Object Detection. Neural Process Lett 51, 1907–1919 (2020). https://doi.org/10.1007/s11063-019-10182-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10182-x