Abstract
Object detection has been widely studied in the last few decades. However, handling objects with different scales is still marked as a challenging requirement. To solve this problem, we explore how to better utilize the features of multiple scales generated by a convolutional neural network. Specifically, we fuse a set of pyramidal features in a circular manner and propose a cascaded module, whose consideration is to enhance a single-scaled feature with information from another scale-different feature. Then, we make it recurrent to further facilitate the fusion of information among multi-scaled features. The proposed module can be integrated into any pyramid architecture. In this paper, we combine it with FPN-based Faster R-CNN, result in a framework named Recurrent Pyramidal Fusion Network (R-PFN). Experiments prove the effectiveness of R-PFN. We achieve new state-of-the-art performances, i.e., 82.0%, 43.3% on the PASCAL VOC 2007 benchmark and MS COCO benchmark in terms of mean AP, respectively.
Student Paper.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS (2015)
Dai, J., Li, Y., He, K., Sun, J.: R-FCN: object detection via region-based fully convolutional networks. In: NIPS (2016)
Lin, T., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: CVPR (2017)
Girshick, R.: Fast R-CNN. In: ICCV (2015)
Gidaris, S., Komodakis, N.: Object detection via a multi-region and semantic segmentation-aware CNN model. In: ICCV (2015)
Liu, S., Qi, L., Qin, H., Shi, J., Jia, J.: Path aggregation network for instance segmentation. In: CVPR (2018)
Zhao, Q., et al.: M2det: a single-shot object detector based on multi-level feature pyramid network. In: AAAI (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015)
Lin, T., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: ICCV (2017)
Pang, J., Chen, K., Shi, J., Feng, H., Ouyang, W., Lin, D.: Libra R-CNN: towards balanced learning for object detection. In: CVPR (2019)
Ghiasi, G. and Lin, T. and Le, Q. V.: NAS-FPN: learning scalable feature pyramid architecture for object detection. In: CVPR (2019)
Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection (2019). arXiv:1901.01892
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Zhou, X., Wang, D., Krähenbühl, P.: Objects as Points (2019). arXiv:1904.07850
Fu, C., Liu, W., Ranga, A., Tyagi, A., Berg, A.C.: DSSD: deconvolutional single shot detector (2017). arXiv:1701.06659
Zhang, S., Wen, L., Bian, X., Lei, Z., Li, S.Z.: Single-shot refinement neural network for object detection. In: CVPR (2018)
Zhu, R., et al.: ScratchDet: exploring to train single-shot object detectors from scratch (2018). arXiv:1810.08425
Dai, J., et al.: Deformable convolutional networks. In: ICCV (2017)
Tychsen-Smith, L., Petersson, L.: Improving object localization with fitness NMS and bounded IOU loss. In: CVPR (2018)
Bae, S.: Object Detection based on Region Decomposition and Assembly (2019). arXiv:1901.08225
Law, H., Deng, J.: CornerNet: detecting objects as paired keypoints. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 765–781. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_45
Zhu, C., He, Y., Savvides, M.: Feature selective anchor-free module for single-shot object detection (2019). arXiv:1903.00621
Redmon, J., Farhadi, A.: YOLOV3: an incremental improvement (2018). arXiv:1804.02767
Acknowledgements
This work is in part supported by National Key Research and Development of China (2017YFC1703503) and National Natural Science Foundation of China (61972022, 61532005).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Jia, Q., Wei, S., Zhao, Y. (2020). R-PFN: Towards Precise Object Detection by Recurrent Pyramidal Feature Fusion. In: Peng, Y., et al. Pattern Recognition and Computer Vision. PRCV 2020. Lecture Notes in Computer Science(), vol 12305. Springer, Cham. https://doi.org/10.1007/978-3-030-60633-6_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-60633-6_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60632-9
Online ISBN: 978-3-030-60633-6
eBook Packages: Computer ScienceComputer Science (R0)