Abstract
In recent years, multi-stage detectors improve the accuracy of object detection to a new level. However, due to multiple stages, these methods typically fall short in the inference speed. To alleviate this problem, we propose a novel object detector—Guided Refine-Head, which is made up of a newly proposed detection network called Refine-Head and a knowledge-distillation-like loss function. Refine-Head is a two-stage detector, and thus Refine-Head has faster inference speed than multi-stage detectors. Nonetheless, Refine-Head is able to predict bounding boxes for incremental IoU thresholds like a multi-stage detector. In addition, we use knowledge-distillation-like loss function to guide the training process of Refine-Head. Therefore, besides fast inference speed, the proposed Guided Refine-Head also has competitive accuracy. Abundant ablation studies and comparative experiments on MS-COCO 2017 validate the superiority of the proposed Guided Refine-Head. It is worth noting that Guided Refine-Head achieves the AP of 38.0% at 10.4 FPS, surpassing Faster R-CNN by 1.8% at the similar speed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)
Ba, J., Caruana, R.: Do deep nets really need to be deep? In: NeurIPS (2014)
Bucilă, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: SIGKDD (2006)
Cai, Z., Vasconcelos, N.: Cascade R-CNN: Delving into high quality object detection. In: CVPR (2018)
Chen, G., Choi, W., Yu, X., Han, T., Chandraker, M.: Learning efficient object detection models with knowledge distillation. In: NeurIPS (2017)
Chen, Y., Li, J., Xiao, H., Jin, X., Yan, S., Feng, J.: Dual path networks. In: NeurIPS (2017)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NeurIPS (2016)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Gidaris, S., Komodakis, N.: Attend refine repeat: active box proposal generation via in-out localization. arXiv preprint (2016). arXiv:1606.04446
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint (2015). arXiv:1503.02531
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: CVPR (2018)
Huang, G., Liu, Z., Weinberger, K.Q., van der Maaten, L.D.: Densely connected convolutional networks. In: CVPR (2017)
Li, X., Wang, W., Hu, X., Yang, J.: Selective kernel networks. In: CVPR (2019)
Lin, T.-Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: ECCV (2014)
Liu, W., et al.: SSD: single shot multibox detector. In: ECCV (2016)
Masnadi-Shirazi, H., Vasconcelos, N.: Cost-sensitive boosting. IEEE Trans. Pattern Anal. Mach. Intell. 33(2), 294–309 (2010)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: CVPR (2016)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NeurIPS (2015)
Shen, J., Vesdapunt, N., Boddeti, V.N., Kitani, K.M.: In teacher we trust: learning compressed models for pedestrian detection. arXiv preprint (2016). arXiv:1612.00478
Uijlings, J.R.R., Van De Sande, K.E.A., Gevers, T., Smeulders, A.W.M.: Selective search for object recognition. Int. J. Comput. Vis. 104(2), 154–171 (2013)
Wang, W., Li, X., Lu, T., Yang, J.: Mixed link networks. In: IJCAI (2018)
Zitnick, C.L., Dollár, P.: Edge boxes: locating object proposals from edges. In: ECCV (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Zeng, L., Song, Y., Wang, W. (2020). Guided Refine-Head for Object Detection. In: Ro, Y., et al. MultiMedia Modeling. MMM 2020. Lecture Notes in Computer Science(), vol 11961. Springer, Cham. https://doi.org/10.1007/978-3-030-37731-1_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-37731-1_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-37730-4
Online ISBN: 978-3-030-37731-1
eBook Packages: Computer ScienceComputer Science (R0)