Improving Deep Object Detection Backbone with Feature Layers

Hong, Weiheng; Song, Andy

doi:10.1007/978-3-030-77977-1_8

Weiheng Hong^13,14 &
Andy Song¹³

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12746))

Included in the following conference series:

International Conference on Computational Science

2676 Accesses
2 Citations

Abstract

Deep neural networks are the frontier in object detection, a key modern computing task. The dominant methods involve two-stage deep networks that heavily rely on features extracted by the backbone in the first stage. In this study, we propose an improved model, ResNeXt101S, to improve feature quality for layers that might be too deep. It introduces splits in middle layers for feature extraction and a deep feature pyramid network (DFPN) for feature aggregation. This backbone is neither much larger than the leading model ResNeXt nor increasing computational complexity distinctly. It is applicable to a range of different image resolutions. The evaluation of customized benchmark datasets using various image resolutions shows that the improvement is effective and consistent. In addition, the study shows input resolution does impact detection performance. In short, our proposed backbone can achieve better accuracy under different resolutions comparing to state-of-the-art models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Object Detection Based on Multiscale Merged Feature Map

Exploring Multi-scale Deep Feature Fusion for Object Detection

NAS-WFPN: Neural Architecture Search Weighted Feature Pyramid Networks for Object Detection

Notes

1.
Available on the MS COCO dataset website http://cocodataset.org.

References

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Everingham, M., Eslami, S.A., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vision 111(1), 98–136 (2015)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1440–1448 (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Krizhevsky, A.: One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997 (2014)
Kuznetsova, A., et al.: The open images dataset v4: unified image classification, object detection, and visual relationship detection at scale. arXiv preprint arXiv:1811.00982 (2018)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2117–2125 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Liu, L., et al.: Deep learning for generic object detection: a survey. arXiv preprint arXiv:1809.02165 (2018)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Liu, Y., et al.: CBNet: a novel composite backbone network architecture for object detection. arXiv preprint arXiv:1909.03625 (2019)
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030 (2021)
Lui, Y.M., Bolme, D., Draper, B.A., Beveridge, J.R., Givens, G., Phillips, P.J.: A meta-analysis of face recognition covariates. In: 2009 IEEE 3rd International Conference on Biometrics: Theory, Applications, and Systems, pp. 1–8. IEEE (2009)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)
Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Shekhar, S., Patel, V.M., Chellappa, R.: Synthesis-based robust low resolution face recognition. arXiv preprint arXiv:1707.02733 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Tan, M., Pang, R., Le, Q.V.: EfficientDet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10781–10790 (2020)
Google Scholar
Tang, P., et al.: Weakly supervised region proposal network and object detection. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 370–386. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_22
Chapter Google Scholar
Wang, C.Y., Bochkovskiy, A., Liao, H.Y.M.: Scaled-yolov4: scaling cross stage partial network. arXiv preprint arXiv:2011.08036 (2020)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492–1500 (2017)
Google Scholar
Yohanandan, S., Song, A., Dyer, A.G., Tao, D.: Saliency preservation in low-resolution grayscale images. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 237–254. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_15
Chapter Google Scholar
Zangeneh, E., Rahmati, M., Mohsenzadeh, Y.: Low resolution face recognition using a two-branch deep convolutional neural network architecture. arXiv preprint arXiv:1706.06247 (2017)
Zeiler, M.D.: Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Download references

Author information

Authors and Affiliations

RMIT University, Melbourne, VIC, 3000, Australia
Weiheng Hong & Andy Song
Xiamen Research Center of Urban Planning Digital Technology, Xiamen, 361015, Fujian, China
Weiheng Hong

Authors

Weiheng Hong
View author publications
You can also search for this author in PubMed Google Scholar
Andy Song
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

AGH University of Science and Technology, Krakow, Poland
Maciej Paszynski
Ludwig-Maximilians-Universität München, Munich, Germany
Dieter Kranzlmüller
University of Amsterdam, Amsterdam, The Netherlands
Valeria V. Krzhizhanovskaya
University of Tennessee at Knoxville, Knoxville, TN, USA
Jack J. Dongarra
University of Amsterdam, Amsterdam, The Netherlands
Peter M.A. Sloot

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hong, W., Song, A. (2021). Improving Deep Object Detection Backbone with Feature Layers. In: Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science(), vol 12746. Springer, Cham. https://doi.org/10.1007/978-3-030-77977-1_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-77977-1_8
Published: 09 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-77976-4
Online ISBN: 978-3-030-77977-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Improving Deep Object Detection Backbone with Feature Layers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Object Detection Based on Multiscale Merged Feature Map

Exploring Multi-scale Deep Feature Fusion for Object Detection

NAS-WFPN: Neural Architecture Search Weighted Feature Pyramid Networks for Object Detection

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Improving Deep Object Detection Backbone with Feature Layers

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Object Detection Based on Multiscale Merged Feature Map

Exploring Multi-scale Deep Feature Fusion for Object Detection

NAS-WFPN: Neural Architecture Search Weighted Feature Pyramid Networks for Object Detection

Notes

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation