Feature Enhancement for Multi-scale Object Detection

Zheng, Huicheng; Chen, Jiajie; Chen, Lvran; Li, Ye; Yan, Zhiwei

doi:10.1007/s11063-019-10182-x

Feature Enhancement for Multi-scale Object Detection

Published: 09 January 2020

Volume 51, pages 1907–1919, (2020)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

Huicheng Zheng ORCID: orcid.org/0000-0002-6729-4176^1,2,
Jiajie Chen³,
Lvran Chen^1,2,
Ye Li^1,2 &
…
Zhiwei Yan^1,2

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Recently, deep learning has brought great progress in object detection. However, we believe that traditional hand-crafted features may still contain valuable human knowledge complementary to features learned from raw data. Besides, almost all top-performing object detection methods extract features by using backbones originally designed for image classification. The generated features are often highly semantic, which is beneficial to global image classification, but may lose details useful for object localization and recognition under various scales. To alleviate the problems mentioned above, a feature enhancement method is proposed in this paper. Inspired by the success of histograms of oriented gradients in traditional object detection research, we construct feature channels based on oriented gradients as input to convolutional neural networks to capture discriminative local orientations. The oriented gradients and RGB features are stacked as input of network to enhance the input feature representation. For accurate object localization and recognition, we employ dilated convolutions to increase spatial resolutions of output feature maps while maintaining their respective receptive fields. Hierarchical feature maps with different receptive fields are aggregated into the final feature representation for multi-scale object detection without extra upsampling. Experimental results on PASCAL VOC 2007 and 2012 demonstrate superiority of the proposed method compared with state-of-the-art methods for multi-scale object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Object Detection Based on Multiscale Merged Feature Map

Multi-level feature fusion pyramid network for object detection

Article 04 July 2022

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

References

Kang K, Ouang W, Li H et al (2016) Object detection from video tubelets with convolutional neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 817–825
Satzoda RK, Trivedi MM (2014) Overtaking and receding vehicle detection for driver assistance and naturalistic driving studies. In: Proceedings of international conference on intelligent transportation systems, pp 697–702
Pang S, Yu Z, Luaces O et al (2018) Deep learning and preference learning for object tracking: a combined approach. Neural Process Lett 47(3):859–876
Article Google Scholar
Li L J, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2036–2043
Erhan D, Szegaedy C, Toshev A et al (2014) Scalable object detection using deep neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2147–2154
Hao Z, Liu Y, Qin H et al (2017) Scale-aware face detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 6186–6195
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 580–587
Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R et al (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of international conference on neural information processing systems, pp 91–99
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 886–893
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Viola P, Jones M J (2001) Rapid object detection using a boosted cascade of simple features. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 511–518
Felzenszwalb P, McAllester D, Ramanan D (2008) A discriminatively trained, multiscale, deformable part model. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8
Liu W, Anguelov D, Erhan D et al (2016) SSD: single shot multibox detector. In: Proceedings of European conference on computer vision, pp 21–37
Kong T, Yao A, Chen Y et al (2016) HyperNet: towards accurate region proposal generation and joint object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 845–853
Bell S, Lawrence Z, Bala K et al (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2874–2883
Dai J, Li Y, He K et al (2016) R-FCN: object detection via region-based full convolutional networks. In: Proceedings of international conference on neural information processing systems, pp 379–384
Li J, Wang T, Zhang Y (2011) Face detection using SURF cascade. In: Proceedings of IEEE international conference on computer vision, pp 2183–2190
Zhu L, Chen Y, Yuille A et al (2010) Latent hierarchical structural learning for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1062–1069
Deselaers T, Ferrari V (2010) Global and efficient self-similarity for object classification and detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1633–1640
Zhang J, Huang K, Yu Y et al (2011) Boosted local structured HOG-LBP for object localization. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1393–1400
Li J, Zhang Y (2013) Learning SURF cascade for fast and accurate object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 3468–3475
Azizpour H, Laptev I (2012) Object detection using strongly-supervised deformable part models. In: Proceedings of European conference on computer vision, pp 836–849
Dollar P, Appel R, Belongie S, Perona P (2014) Fast feature pyramids for object detection. IEEE Trans Pattern Anal Mach Intell 36(8):1532–1545
Article Google Scholar
Van De Sande KEA, Uijlings JRR, Gevers T et al (2011) Segmentation as selective search for object recognition. In: Proceedings of IEEE international conference on computer vision, pp 1879–1886
Zitnick CL, Dollar P (2014) Edge boxes: locating object proposals from edges. In: Proceedings of European conference on computer vision, pp 391–405
Lin T, Dollar P, Girshick R et al (2017) Feature pyramid network for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2117–2125
Redmon J, Divvala S, Girshick R et al (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 779–788
Kong T, Sun F, Yao A et al (2017) RON: reverse connection with objectness prior networks for object detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 5936–5944
Shen Z, Liu Z, Li J et al (2017) DSOD: learning deeply supervised object detectors from scratch. In: Proceedings of IEEE international conference on computer vision, pp 1919–1927
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of international conference on learning representations
Bodla N, Singh B, Chellappa R et al (2017) Soft-NMS-improving object detection with one line of code. In: Proceedings of IEEE international conference on computer vision, pp 5561–5569
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of European conference on computer vision, pp 818–833
Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1933–1941
Yu F, Koltun V (2016) Multi-scale context aggregation by dilated convolutions. In: Proceedings of international conference on learning representations, pp 1–13
Everingham M, Eslami SA, Van Gool L et al (2015) The PASCAL visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Article Google Scholar
Wang L, Xiong Y, Wang, Z et al (2016) Temporal segment networks: towards good practices for deep action recognition. In: Proceedings of European conference on computer vision, pp 20–36
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 7263–7271
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 4700–4708
Hoiem D, Chodpathumwan Y, Dai Q (2012) Diagnosing error in object detectors. In: Proceedings of European conference on computer vision, pp 340–353

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Nos. 61172141, 61976231), Guangdong Basic and Applied Basic Research Foundation (No. 2019A1515011869), Special Program for Applied Research on Super Computation of the NSFC-Guangdong Joint Fund (the second phase, No. U1501501), Project on the Integration of Industry, Education and Research of Guangdong Province (No. 2013B090500013), Science and Technology Program of Guangzhou (Nos. 201803030029, 2014J4100092), and Major Projects for the Innovation of Industry and Research of Guangzhou (No. 2014Y2-00213).

Author information

Authors and Affiliations

School of Data and Computer Science, Sun Yat-sen University, 135 West Xingang Road, Guangzhou, 510275, China
Huicheng Zheng, Lvran Chen, Ye Li & Zhiwei Yan
Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, 135 West Xingang Road, Guangzhou, 510275, China
Huicheng Zheng, Lvran Chen, Ye Li & Zhiwei Yan
Digital Grid Research Institute, China Southern Power Grid, 106 East Fengze Road, Guangzhou, 511458, China
Jiajie Chen

Authors

Huicheng Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jiajie Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lvran Chen
View author publications
You can also search for this author in PubMed Google Scholar
Ye Li
View author publications
You can also search for this author in PubMed Google Scholar
Zhiwei Yan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huicheng Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zheng, H., Chen, J., Chen, L. et al. Feature Enhancement for Multi-scale Object Detection. Neural Process Lett 51, 1907–1919 (2020). https://doi.org/10.1007/s11063-019-10182-x

Download citation

Published: 09 January 2020
Issue Date: April 2020
DOI: https://doi.org/10.1007/s11063-019-10182-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Feature Enhancement for Multi-scale Object Detection

Abstract

Access this article

Similar content being viewed by others

Object Detection Based on Multiscale Merged Feature Map

Multi-level feature fusion pyramid network for object detection

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Feature Enhancement for Multi-scale Object Detection

Abstract

Access this article

Similar content being viewed by others

Object Detection Based on Multiscale Merged Feature Map

Multi-level feature fusion pyramid network for object detection

Feature Combination Based on Receptive Fields and Cross-Fusion Feature Pyramid for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation