Guided Refine-Head for Object Detection

Zeng, Lingyun; Song, You; Wang, Wenhai

doi:10.1007/978-3-030-37731-1_17

Lingyun Zeng¹⁶,
You Song¹⁶ &
Wenhai Wang¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11961))

Included in the following conference series:

International Conference on Multimedia Modeling

2725 Accesses

Abstract

In recent years, multi-stage detectors improve the accuracy of object detection to a new level. However, due to multiple stages, these methods typically fall short in the inference speed. To alleviate this problem, we propose a novel object detector—Guided Refine-Head, which is made up of a newly proposed detection network called Refine-Head and a knowledge-distillation-like loss function. Refine-Head is a two-stage detector, and thus Refine-Head has faster inference speed than multi-stage detectors. Nonetheless, Refine-Head is able to predict bounding boxes for incremental IoU thresholds like a multi-stage detector. In addition, we use knowledge-distillation-like loss function to guide the training process of Refine-Head. Therefore, besides fast inference speed, the proposed Guided Refine-Head also has competitive accuracy. Abundant ablation studies and comparative experiments on MS-COCO 2017 validate the superiority of the proposed Guided Refine-Head. It is worth noting that Guided Refine-Head achieves the AP of 38.0% at 10.4 FPS, surpassing Faster R-CNN by 1.8% at the similar speed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Article Google Scholar
Ba, J., Caruana, R.: Do deep nets really need to be deep? In: NeurIPS (2014)
Google Scholar
Bucilă, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: SIGKDD (2006)
Google Scholar
Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving into high quality object detection. In: CVPR (2018)
Google Scholar
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NeurIPS (2017)
Google Scholar
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: NeurIPS (2017)
Google Scholar
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Gidaris, S., Komodakis, N.: Attend refine repeat: active box proposal generation via in-out localization. arXiv preprint (2016). arXiv:1606.04446
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Google Scholar
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint (2015). arXiv:1503.02531
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.D.: Densely connected convolutional networks. In: CVPR (2017)
Google Scholar
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: CVPR (2019)
Google Scholar
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: ECCV (2014)
Chapter Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV (2016)
Google Scholar
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 294–309 (2010)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Google Scholar
Shen, J., Vesdapunt, N., Boddeti, V.N., Kitani, K.M.: In teacher we trust: learning compressed models for pedestrian detection. arXiv preprint (2016). arXiv:1612.00478
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Article Google Scholar
Wang, W., Li, X., Lu, T., Yang, J.: Mixed link networks. In: IJCAI (2018)
Google Scholar
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: ECCV (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Beihang University, Beijing, China
Lingyun Zeng & You Song
Nanjing University, Nanjing, China
Wenhai Wang

Authors

Lingyun Zeng
View author publications
You can also search for this author in PubMed Google Scholar
You Song
View author publications
You can also search for this author in PubMed Google Scholar
Wenhai Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to You Song .

Editor information

Editors and Affiliations

Korea Advanced Institute of Science and, Daejeon, Korea (Republic of)
Yong Man Ro
National Chiao Tung University, Hsinchu, Taiwan
Wen-Huang Cheng
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Junmo Kim
National Cheng Kung University, Tainan City, Taiwan
Wei-Ta Chu
Tsinghua University, Beijing, China
Peng Cui
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Jung-Woo Choi
National Tsing Hua University, Hsinchu, Taiwan
Min-Chun Hu
Ghent University, Ghent, Belgium
Wesley De Neve

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zeng, L., Song, Y., Wang, W. (2020). Guided Refine-Head for Object Detection. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-37731-1_17
Published: 24 December 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics