ABSTRACT
Feature fusion is an important trick to improve the performance of object detectors at present. When extracting features of a certain scale, we must consider the semantics of all feature points in a large neighborhood centered on this feature. With the expansion of the reception field, the information entropy of feature points decreases and becomes easier to learn. Therefore, we propose a new feature fusion method – Look Around, which is different from the previous FPN, PAFPN, BiFPN, etc. Our feature fusion will make full use of the relationship between these feature points and supplement the semantic information of target feature points by neighbors. After extensive experiments on the PASCAL VOC dataset, the result shows that the Look Around Fusion method improves mAP by 3.5%, which is better than FPN and FSSD.
- B. Li, J. Yan, W. Wu, Z. Zhu, and X. Hu, “High Performance Visual Tracking with Siamese Region Proposal Network,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 8971–8980, 2018, doi: 10.1109/CVPR.2018.00935.Google ScholarCross Ref
- K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” 3rd Int. Conf. Learn. Represent. ICLR 2015 - Conf. Track Proc., pp. 1–14, 2015.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, 2017, doi: 10.1109/TPAMI.2016.2577031.Google ScholarDigital Library
- M. Tan, R. Pang, and Q. V Le, “EfficientDet: Scalable and Efficient Object Detection.”Google Scholar
- K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, 2020, doi: 10.1109/TPAMI.2018.2844175.Google ScholarCross Ref
- X. Li, T. Lai, S. Wang, Q. Chen, C. Yang, and R. Chen, “Weighted feature pyramid networks for object detection,” Proc. - 2019 IEEE Intl Conf Parallel Distrib. Process. with Appl. Big Data Cloud Comput. Sustain. Comput. Commun. Soc. Comput. Networking, ISPA/BDCloud/SustainCom/SocialCom 2019, pp. 1500–1504, 2019, doi: 10.1109/ISPA-BDCloud-SustainCom-SocialCom48970.2019.00217.Google ScholarCross Ref
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 8759–8768, 2018, doi: 10.1109/CVPR.2018.00913.Google ScholarCross Ref
- C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “AUGFPN: Improving multi-scale feature learning for object detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 12592–12601, 2020, doi: 10.1109/CVPR42600.2020.01261.Google ScholarCross Ref
- G. Ghiasi, T. Y. Lin, and Q. V. Le, “NAS-FPN: Learning scalable feature pyramid architecture for object detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 7029–7038, 2019, doi: 10.1109/CVPR.2019.00720.Google ScholarCross Ref
- Z. X. Li and F. Q. Zhou, “FSSD: Feature fusion single shot multibox detector,” arXiv, vol. 1, 2017.Google Scholar
- Y. Luo , “CE-FPN: Enhancing Channel Information for Object Detection,” vol. 14, no. 8, pp. 1–9, 2015.Google Scholar
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 580–587, 2014, doi: 10.1109/CVPR.2014.81.Google ScholarDigital Library
- R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2015 Inter, pp. 1440–1448, 2015, doi: 10.1109/ICCV.2015.169.Google ScholarDigital Library
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 779–788, 2016, doi: 10.1109/CVPR.2016.91.Google ScholarCross Ref
- J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” Proc. - 30th IEEE Conf. Comput. Vis. Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525, 2017, doi: 10.1109/CVPR.2017.690.Google ScholarCross Ref
- J. Redmon and A. Farhadi, “YOLO v.3,” Tech Rep., pp. 1–6, 2018, [Online]. Available: https://pjreddie.com/media/files/papers/YOLOv3.pdf.Google Scholar
- A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv, 2020.Google Scholar
- W. Liu , “SSD: Single shot multibox detector,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 9905 LNCS, pp. 21–37, 2016, doi: 10.1007/978-3-319-46448-0_2.Google Scholar
- C. V Mar, “CornerNet: Detecting Objects as Paired Keypoints,” 2017.Google Scholar
- Q. Zhao , “M2det: A single-shot object detector based on multi-level feature pyramid network,” 33rd AAAI Conf. Artif. Intell. AAAI 2019, 31st Innov. Appl. Artif. Intell. Conf. IAAI 2019 9th AAAI Symp. Educ. Adv. Artif. Intell. EAAI 2019, pp. 9259–9266, 2019, doi: 10.1609/aaai.v33i01.33019259.Google ScholarDigital Library
- T. Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal Loss for Dense Object Detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 318–327, 2020, doi: 10.1109/TPAMI.2018.2858826.Google ScholarCross Ref
- H. Zhao, J. Shi, X. Qi, X. Wang, J. Jia, and S. G. Limited, “Pyramid Scene Parsing Network.”Google Scholar
- F. Yu and V. Koltun, “Multi-scale context aggregation by dilated convolutions,” 4th Int. Conf. Learn. Represent. ICLR 2016 - Conf. Track Proc., 2016.Google Scholar
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2016-Decem, pp. 770–778, 2016, doi: 10.1109/CVPR.2016.90.Google ScholarCross Ref
Index Terms
- CSCP-Net: Cuboid-Shaped Child-Pyramid Augmentation
Recommendations
Research on improved algorithm of object detection based on feature pyramid
To solve the low detection accuracy of SSD for the small size object, this paper proposed an improved algorithm of SSD object detection based on the feature pyramid (FP-SSD). In the deep convolutional neural network, the high-level features contain well ...
A Bimodal Biometric Verification System Based on Deep Learning
ICVIP '17: Proceedings of the International Conference on Video and Image ProcessingIn order to improve the limitation of single-mode biometric identification technology, a bimodal biometric verification system based on deep learning is proposed in this paper. A modified CNN architecture is used to generate better facial feature for ...
CB-FPN: object detection feature pyramid network based on context information and bidirectional efficient fusion
AbstractFeature pyramid network (FPN) is a typical structure in object detection. It can improve the accuracy of detection results by fusing feature information at different resolutions and enhancing the expression ability of different levels of features. ...
Comments