Learning region-guided scale-aware feature selection for object detection

Liu, Liu; Wang, Rujing; Xie, Chengjun; Li, Rui; Wang, Fangyuan; Zhou, Man; Teng, Yue

doi:10.1007/s00521-020-05400-w

Learning region-guided scale-aware feature selection for object detection

Original Article
Published: 10 October 2020

Volume 33, pages 6389–6403, (2021)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Liu Liu ORCID: orcid.org/0000-0003-4218-8008^1,2,
Rujing Wang²,
Chengjun Xie²,
Rui Li²,
Fangyuan Wang^1,2,
Man Zhou^1,2 &
…
Yue Teng^1,2

465 Accesses
3 Citations
Explore all metrics

Abstract

Scale variation is one of the major challenges in object detection task. Modern region-based object detection architectures often adopt Feature Pyramid Network (FPN) as feature extraction neck to achieve multi-scale feature representation in solving scale variation problem. However, due to the rough feature selection strategy in Region of Interest (RoI) feature extraction step, these methods might not perform well on object detection under strong scale variation. In this work, we are motivated by the limitations of current FPN-based two-stage object detectors and then present a novel module, namely scale-aware feature selective (SAFS) module, that flexibly and adaptively selects feature levels in two-stage object detectors. Specifically, we firstly build the RoI Pyramid in standard FPN structure to extract RoI features from various scale levels. Next, in order to achieve scale-aware mechanism for solving scale variation issue, we develop a novel weighting gate function containing one set of trainable parameters to automatically learn the fusion weight for each RoI feature level, which relieves the limitation of hard feature selection strategy guided by online instance size. Outputs from the RoI features with the learned weights are fused for classification and bounding box regression. Furthermore, we design a multi-level SAFS architecture to obtain different types of RoI feature combinations that ensures our method is more robust to various instance scales. Experimental results show that our SAFS module is very compatible with most of two-stage object detectors and could achieve state-of-the-art results with Average Precision of 48.3 on COCO test-dev and other popular object detection benchmarks. Our code will be made publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-level feature fusion pyramid network for object detection

Article 04 July 2022

Scale-Aware Regional Collective Feature Enhancement Network for Scene Object Detection

Article 03 January 2023

Selective Multi-scale Learning for Object Detection

References

Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580–587
Redmon J, Farhadi A (2018) YOLOv3: an incremental improvement. arXiv preprint arXiv:180402767
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2117–2125
Yang F, Choi W, Lin Y (2016) Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2137
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 2961–2969
Hu P, Ramanan D (2017) Finding tiny faces. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 951–959
Singh B, Davis LS (2018) An analysis of scale invariance in object detection snip. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3578–3587
Li J, Liang X, Shen S, Xu T, Feng J, Yan S (2017) Scale-aware fast R-CNN for pedestrian detection. IEEE Trans Multimedia 20(4):985–996
Google Scholar
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE international conference on computer vision, pp 6054–6063
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 779–788
Redmon J, Farhadi A (2017) YOLO9000: better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271
Fu CY, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: deconvolutional single shot detector. arXiv preprint arXiv:170106659
Lin TY, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Zhang S, Wen L, Bian X, Lei Z, Li SZ (2018) Single-shot refinement neural network for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4203–4212
Uijlings JR, Van De Sande KE, Gevers T, Smeulders AW (2013) Selective search for object recognition. Int J Comput Vis 104(2):154–171
Article Google Scholar
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 821–830
Kong T, Yao A, Chen Y, Sun F (2016) Hypernet: towards accurate region proposal generation and joint object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 845–853
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8759–8768
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:151107122
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 840–849
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: European conference on computer vision. Springer, pp 740–755
Krishna K, Murty MN (1999) Genetic k-means algorithm. IEEE Trans Syst Man Cybern Part B (Cybern) 29(3):433–439
Article Google Scholar
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: 18th international conference on pattern recognition (ICPR’06), vol 3. IEEE, pp 850–855
Hecht-Nielsen R (1992) Theory of the backpropagation neural network. In: Neural networks for perception. Elsevier, pp 65–93
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Girshick R, Radosavovic I, Gkioxari G, Dollár P, He K (2018) Detectron. https://github.com/facebookresearch/detectron
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500
Zhu X, Hu H, Lin S, Dai J (2019) Deformable convnets v2: more deformable, better results. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 9308–9316
Everingham M, Van Gool L, Williams CK, Winn J, Zisserman A (2010) The pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Article Google Scholar
Zhang H, Li D, Ji Y, Zhou H, Wu W, Liu K (2019) Towards new retail: a benchmark dataset for smart unmanned vending machines. IEEE Trans Ind Inform 16:7722–7731
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant Nos. 61773360 and 31671586 and in part by the Major Special Science and Technology Project of Anhui Province under Grant No. 201903a06020006.

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230026, China
Liu Liu, Fangyuan Wang, Man Zhou & Yue Teng
Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei, 230031, China
Liu Liu, Rujing Wang, Chengjun Xie, Rui Li, Fangyuan Wang, Man Zhou & Yue Teng

Authors

Liu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Rujing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Chengjun Xie
View author publications
You can also search for this author in PubMed Google Scholar
Rui Li
View author publications
You can also search for this author in PubMed Google Scholar
Fangyuan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Man Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yue Teng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rujing Wang.

Ethics declarations

Conflict of interest

We declare that we do not have any commercial or associative interest that represents a conflict of interest in connection with the work submitted.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, L., Wang, R., Xie, C. et al. Learning region-guided scale-aware feature selection for object detection. Neural Comput & Applic 33, 6389–6403 (2021). https://doi.org/10.1007/s00521-020-05400-w

Download citation

Received: 08 May 2020
Accepted: 24 September 2020
Published: 10 October 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s00521-020-05400-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning region-guided scale-aware feature selection for object detection

Abstract

Access this article

Similar content being viewed by others

Multi-level feature fusion pyramid network for object detection

Scale-Aware Regional Collective Feature Enhancement Network for Scene Object Detection

Selective Multi-scale Learning for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Learning region-guided scale-aware feature selection for object detection

Abstract

Access this article

Similar content being viewed by others

Multi-level feature fusion pyramid network for object detection

Scale-Aware Regional Collective Feature Enhancement Network for Scene Object Detection

Selective Multi-scale Learning for Object Detection

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation