Skip to main content
Log in

Consistent scale normalization for object perception

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Recently, object detection has been a vital aspect in the vision community, while scale variation of objects in images or videos usually brings challenge for performance improvement. To combat this problem, conventional paradigms generally adopt image pyramid or Feature Pyramid Network (FPN) to process objects at different scales. However, existing multi-scale deep convolution neural networks mostly set different scales in a heuristic way, which may introduce inconsistency between the region of interest and the semantic scope. In this paper, we propose an innovative paradigm called Consistent Scale Normalization (CSN) to weaken the influence of scale variation for object detection. The proposed CSN can realize a consistent compression for the scale space of objects, in both training and testing phases. Extensive experimental testing is performed on COCO object detection benchmark in comparison with several state-of-the-art methods. In addition to object detection, experiments on instance segmentation and multi-task human pose estimation are also conducted. Furthermore, the CSN paradigm is beneficial to reduce the difficulty of network learning. The results verify the effectiveness and superiority of the CSN paradigm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. https://mxnet.apache.org/

  2. https://github.com/open-mmlab/mmdetection/blob/d2a8ba76457822daf5a866b72cb1f9ae8a7fc52f/docs/model_zoo.md

References

  1. Adelson E, Anderson C, Bergen J, Burt P, Ogden J (1984) Pyramid methods in image processing. RCA Engineer 29(6):33–41

    Google Scholar 

  2. Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms — improving object detection with one line of code. In: Proceedings of IEEE ICCV, pp 5562–5570

  3. Cai Z, Vasconcelos N (2019) Cascade r-cnn: High quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell

  4. Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of ECCV, pp 354–370

  5. Ke C (2019) MMDetection: Open mmlab detection toolbox and benchmark. arXiv:190607155

  6. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of IEEE ICCV, pp 764–773

  7. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2):303–338

    Article  Google Scholar 

  8. Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645

    Article  Google Scholar 

  9. Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE ICCV, pp 1440–1448

  10. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR, pp 580–587

  11. Girshick R, Radosavovic I, Gkioxari G, Dollár P, He K (2018) Detectron. https://github.com/facebookresearch/detectron

  12. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:170602677

  13. Hao Z, Liu Y, Qin H, Yan J, Li X, Hu X (2017) Scale-aware face detection. In: Proceedings of IEEE CVPR, pp 1913–1922

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp 770–778

  15. He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE ICCV, pp 2980–2988

  16. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of ICML, pp 448–456

  17. Leonardo R, Akbar K, Andrea P (2020) A novel region of interest extraction layer for instance segmentation

  18. Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of IEEE ICCV

  19. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of ECCV, pp 740–755

  20. Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: Proceedings of IEEE CVPR, pp 936–944

  21. Lin TY, Goyal P, Girshick R, He K, Dollár P (2017b) Focal loss for dense object detection. In: Proceedings of IEEE ICCV, pp 2999–3007

  22. Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2018) Deep learning for generic object detection: a survey. arXiv:180902165

  23. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of ECCV, pp 21–37

  24. Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: Proceedings of IEEE ICCV, pp 4885–4894

  25. Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of IEEE CVPR

  26. Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of IEEE CVPR , pp 3711–3719

  27. Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE CVPR, pp 779–788

  28. Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, pp 91–99

  29. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of IEEE CVPR, pp 4510–4520

  30. Singh B, Davis LS (2018) An analysis of scale invariance in object detection - SNIP. In: Proceedings of IEEE CVPR, pp 3578–3587

  31. Singh B, Najibi M, Davis LS (2018) SNIPER: efficient multi-scale training. In: Proceedings of NIPS, pp 9333–9343

  32. Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154

    Article  Google Scholar 

  33. Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D (2019a) Carafe: Content-aware reassembly of features. In: Proceedings of IEEE ICCV

  34. Wang J, Chen K, Yang S, Loy CC, Lin D (2019b) Region proposal by guided anchoring. In: Proceedings of IEEE CVPR, pp 2965–2974

  35. Wu Y, He K (2018) Group normalization. arXiv:180308494

  36. Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: The IEEE international conference on computer vision (ICCV)

  37. Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of IEEE CVPR

  38. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of IEEE ICCV

Download references

Acknowledgements

The authors are thankful for the financial support from the National Key Research and Development Program of China (No. 2016QY03D0500) (Grant No. 41871020) and the National Natural Science Foundation of China (Grant No. U1636220 and 61906190).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wensheng Zhang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

He, Z., Huang, H., Wu, Y. et al. Consistent scale normalization for object perception. Appl Intell 51, 4490–4502 (2021). https://doi.org/10.1007/s10489-020-02070-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-02070-y

Keywords

Navigation