Abstract
Recently, object detection has been a vital aspect in the vision community, while scale variation of objects in images or videos usually brings challenge for performance improvement. To combat this problem, conventional paradigms generally adopt image pyramid or Feature Pyramid Network (FPN) to process objects at different scales. However, existing multi-scale deep convolution neural networks mostly set different scales in a heuristic way, which may introduce inconsistency between the region of interest and the semantic scope. In this paper, we propose an innovative paradigm called Consistent Scale Normalization (CSN) to weaken the influence of scale variation for object detection. The proposed CSN can realize a consistent compression for the scale space of objects, in both training and testing phases. Extensive experimental testing is performed on COCO object detection benchmark in comparison with several state-of-the-art methods. In addition to object detection, experiments on instance segmentation and multi-task human pose estimation are also conducted. Furthermore, the CSN paradigm is beneficial to reduce the difficulty of network learning. The results verify the effectiveness and superiority of the CSN paradigm.
Similar content being viewed by others
References
Adelson E, Anderson C, Bergen J, Burt P, Ogden J (1984) Pyramid methods in image processing. RCA Engineer 29(6):33–41
Bodla N, Singh B, Chellappa R, Davis LS (2017) Soft-nms — improving object detection with one line of code. In: Proceedings of IEEE ICCV, pp 5562–5570
Cai Z, Vasconcelos N (2019) Cascade r-cnn: High quality object detection and instance segmentation. IEEE Trans Pattern Anal Mach Intell
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of ECCV, pp 354–370
Ke C (2019) MMDetection: Open mmlab detection toolbox and benchmark. arXiv:190607155
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: Proceedings of IEEE ICCV, pp 764–773
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 88(2):303–338
Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part-based models. IEEE Trans Pattern Anal Mach Intell 32(9):1627–1645
Girshick R (2015) Fast R-CNN. In: Proceedings of IEEE ICCV, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of IEEE CVPR, pp 580–587
Girshick R, Radosavovic I, Gkioxari G, Dollár P, He K (2018) Detectron. https://github.com/facebookresearch/detectron
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:170602677
Hao Z, Liu Y, Qin H, Yan J, Li X, Hu X (2017) Scale-aware face detection. In: Proceedings of IEEE CVPR, pp 1913–1922
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE CVPR, pp 770–778
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of IEEE ICCV, pp 2980–2988
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of ICML, pp 448–456
Leonardo R, Akbar K, Andrea P (2020) A novel region of interest extraction layer for instance segmentation
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of IEEE ICCV
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: Proceedings of ECCV, pp 740–755
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017a) Feature pyramid networks for object detection. In: Proceedings of IEEE CVPR, pp 936–944
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017b) Focal loss for dense object detection. In: Proceedings of IEEE ICCV, pp 2999–3007
Liu L, Ouyang W, Wang X, Fieguth P, Chen J, Liu X, Pietikäinen M (2018) Deep learning for generic object detection: a survey. arXiv:180902165
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: Proceedings of ECCV, pp 21–37
Najibi M, Samangouei P, Chellappa R, Davis LS (2017) SSH: single stage headless face detector. In: Proceedings of IEEE ICCV, pp 4885–4894
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra r-cnn: Towards balanced learning for object detection. In: Proceedings of IEEE CVPR
Papandreou G, Zhu T, Kanazawa N, Toshev A, Tompson J, Bregler C, Murphy K (2017) Towards accurate multi-person pose estimation in the wild. In: Proceedings of IEEE CVPR , pp 3711–3719
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of IEEE CVPR, pp 779–788
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Proceedings of NIPS, pp 91–99
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: inverted residuals and linear bottlenecks. In: Proceedings of IEEE CVPR, pp 4510–4520
Singh B, Davis LS (2018) An analysis of scale invariance in object detection - SNIP. In: Proceedings of IEEE CVPR, pp 3578–3587
Singh B, Najibi M, Davis LS (2018) SNIPER: efficient multi-scale training. In: Proceedings of NIPS, pp 9333–9343
Viola P, Jones MJ (2004) Robust real-time face detection. Int J Comput Vis 57(2):137–154
Wang J, Chen K, Xu R, Liu Z, Loy CC, Lin D (2019a) Carafe: Content-aware reassembly of features. In: Proceedings of IEEE ICCV
Wang J, Chen K, Yang S, Loy CC, Lin D (2019b) Region proposal by guided anchoring. In: Proceedings of IEEE CVPR, pp 2965–2974
Wu Y, He K (2018) Group normalization. arXiv:180308494
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: The IEEE international conference on computer vision (ICCV)
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of IEEE CVPR
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of IEEE ICCV
Acknowledgements
The authors are thankful for the financial support from the National Key Research and Development Program of China (No. 2016QY03D0500) (Grant No. 41871020) and the National Natural Science Foundation of China (Grant No. U1636220 and 61906190).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
He, Z., Huang, H., Wu, Y. et al. Consistent scale normalization for object perception. Appl Intell 51, 4490–4502 (2021). https://doi.org/10.1007/s10489-020-02070-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-020-02070-y