Consistent scale normalization for object perception

He, Zewen; Huang, He; Wu, Yudong; Yang, Xuebing; Zhang, Wensheng

doi:10.1007/s10489-020-02070-y

Consistent scale normalization for object perception

Published: 04 January 2021

Volume 51, pages 4490–4502, (2021)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zewen He^1,2,
He Huang^1,2,
Yudong Wu^1,2,
Xuebing Yang¹ &
…
Wensheng Zhang^1,2

319 Accesses
1 Citation
Explore all metrics

Abstract

Recently, object detection has been a vital aspect in the vision community, while scale variation of objects in images or videos usually brings challenge for performance improvement. To combat this problem, conventional paradigms generally adopt image pyramid or Feature Pyramid Network (FPN) to process objects at different scales. However, existing multi-scale deep convolution neural networks mostly set different scales in a heuristic way, which may introduce inconsistency between the region of interest and the semantic scope. In this paper, we propose an innovative paradigm called Consistent Scale Normalization (CSN) to weaken the influence of scale variation for object detection. The proposed CSN can realize a consistent compression for the scale space of objects, in both training and testing phases. Extensive experimental testing is performed on COCO object detection benchmark in comparison with several state-of-the-art methods. In addition to object detection, experiments on instance segmentation and multi-task human pose estimation are also conducted. Furthermore, the CSN paradigm is beneficial to reduce the difficulty of network learning. The results verify the effectiveness and superiority of the CSN paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks

Article Open access 11 April 2020

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

References

Adelson E, Anderson C, Bergen J, Burt P, Ogden J (1984) Pyramid methods in image processing. RCA Engineer 29(6):33–41
Google Scholar
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms — improving object detection with one line of code. In: Proceedings of IEEE ICCV, pp 5562–5570
Cai Z, Vasconcelos N (2019) Cascade r-cnn: High quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of ECCV, pp 354–370
Ke C (2019) MMDetection: Open mmlab detection toolbox and benchmark. arXiv:190607155
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of IEEE ICCV, pp 764–773
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2):303–338
Article Google Scholar
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Article Google Scholar
Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE ICCV, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR, pp 580–587
Girshick R, Radosavovic I, Gkioxari G, Dollár P, He K (2018) Detectron. https://github.com/facebookresearch/detectron
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:170602677
Hao Z, Liu Y, Qin H, Yan J, Li X, Hu X (2017) Scale-aware face detection. In: Proceedings of IEEE CVPR, pp 1913–1922
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE ICCV, pp 2980–2988
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of ICML, pp 448–456
Leonardo R, Akbar K, Andrea P (2020) A novel region of interest extraction layer for instance segmentation
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of IEEE ICCV
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of ECCV, pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: Proceedings of IEEE CVPR, pp 936–944
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017b) Focal loss for dense object detection. In: Proceedings of IEEE ICCV, pp 2999–3007
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2018) Deep learning for generic object detection: a survey. arXiv:180902165
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of ECCV, pp 21–37
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: Proceedings of IEEE ICCV, pp 4885–4894
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of IEEE CVPR
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of IEEE CVPR , pp 3711–3719
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE CVPR, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, pp 91–99
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of IEEE CVPR, pp 4510–4520
Singh B, Davis LS (2018) An analysis of scale invariance in object detection - SNIP. In: Proceedings of IEEE CVPR, pp 3578–3587
Singh B, Najibi M, Davis LS (2018) SNIPER: efficient multi-scale training. In: Proceedings of NIPS, pp 9333–9343
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Article Google Scholar
Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D (2019a) Carafe: Content-aware reassembly of features. In: Proceedings of IEEE ICCV
Wang J, Chen K, Yang S, Loy CC, Lin D (2019b) Region proposal by guided anchoring. In: Proceedings of IEEE CVPR, pp 2965–2974
Wu Y, He K (2018) Group normalization. arXiv:180308494
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: The IEEE international conference on computer vision (ICCV)
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of IEEE CVPR
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of IEEE ICCV

Download references

Acknowledgements

The authors are thankful for the financial support from the National Key Research and Development Program of China (No. 2016QY03D0500) (Grant No. 41871020) and the National Natural Science Foundation of China (Grant No. U1636220 and 61906190).

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Zewen He, He Huang, Yudong Wu, Xuebing Yang & Wensheng Zhang
School of Artificial Intelligence, University of Chinese Academy of Science, Beijing, China
Zewen He, He Huang, Yudong Wu & Wensheng Zhang

Authors

Zewen He
View author publications
You can also search for this author in PubMed Google Scholar
He Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yudong Wu
View author publications
You can also search for this author in PubMed Google Scholar
Xuebing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wensheng Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wensheng Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

He, Z., Huang, H., Wu, Y. et al. Consistent scale normalization for object perception. Appl Intell 51, 4490–4502 (2021). https://doi.org/10.1007/s10489-020-02070-y

Download citation

Accepted: 06 November 2020
Published: 04 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s10489-020-02070-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Consistent scale normalization for object perception

Abstract

Access this article

Similar content being viewed by others

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Consistent scale normalization for object perception

Abstract

Access this article

Similar content being viewed by others

A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection

An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks

No More Strided Convolutions or Pooling: A New CNN Building Block for Low-Resolution Images and Small Objects

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation