ABSTRACT
Sparse R-CNN uses a purely sparse method to detect objects and achieves good results. However, it does not make full use of the features extracted from the image, so its detection performance needs to be further improved. And we propose Sparse R-CNNv1 and Sparse R-CNNv2. In these algorithms, we use VOVNet with attention mechanism to replace ResNet of the original Sparse R-CNN as our backbone. In addition, we also use two different improved neck networks, Augpan and FPNencoder, to further improve the detection performance of the algorithm from the perspective of feature fusion and increasing the receptive field of each layer, respectively. Our algorithms are trained and verified on COCO2017, and the experimental results show that Sparser-CNNv1 achieves 45.0 AP and Sparser-CNNV2 achieves 45.3 AP, higher than the original SparseR-CNN's 43.0 AP in standard 3× training schedule.
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. J. IEEE Transactions on Pattern Analysis & Machine Intelligence, 39(6):1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031Google ScholarDigital Library
- Yuxin Wu, Alexander Kirillov, Francisco Massa, Wan-Yen Lo, and Ross Girshick. 2019. Detectron2. https://github. com/facebookresearch/detectron2Google Scholar
- Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature Pyramid Networks for Object Detection. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 936-944. https://doi.org/ 10.1109/CVPR.2017.106Google Scholar
- Chien-Yao Wang, Hong-Yuan Mark Liao, Yueh-Hua Wu, Ping-Yang Chen, Jun-Wei Hsieh, I-Hau Yeh. 2020. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW),1571-1580, https://doi.org/10.1109/CVPRW50498.2020.00203Google ScholarCross Ref
- Alexey Bochkovskiy, Chien-Yao Wang, Hong-Yuan Mark Liao. 2020. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934. https://doi.org/10.48550/arXiv.2004.10934Google Scholar
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2017. Focal loss for dense object detection. J. IEEE Transactions on Pattern Analysis & Machine Intelligence, (99):2999-3007. https://doi.org/10.1109/TPAMI.2018.2858826Google Scholar
- Zhi Tian, Chunhua Shen, Hao Chen, and Tong He. 2019. FCOS: Fully convolutional one-stage object detection. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 9626-9635, https://doi.org/10.1109/ICCV.2019.00972Google ScholarCross Ref
- Chaoxu Guo, Bin Fan, Qian Zhang, Shiming Xiang, Chunhong Pan. 2020. Augfpn: Improving multi-scale feature learning for object detection. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 12592-12601. https://doi.org/10.1109/CVPR42600.2020.01261Google ScholarCross Ref
- Hei Law, Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. Proceedings of the European conference on computer vision (ECCV). 734-750. https://doi.org/10.48550/arXiv.1808.01244Google ScholarDigital Library
- Zhaohui Zheng, Ping Wang, Wei Liu, Jinze Li, Rongguang Ye, Dongwei Ren. 2019. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv preprint arXiv:arXiv:1911.08287. https://doi.org/10.48550/arXiv.1911.08287Google Scholar
- Qiang Chen, Yingming Wang, Tong Yang, Xiangyu Zhang, Jian Cheng, Jian Sun. 2021. You only look one-level feature. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 13034-13043. https://doi.org/10.1109/CVPR46437.2021.01284Google ScholarCross Ref
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6517-6525. https://doi.org/10.1109/CVPR.2017.690Google ScholarCross Ref
- Xizhou Zhu, Weijie Su, Lewei Lu, Bin Li, Xiaogang Wang, and Jifeng Dai. 2020. Deformable detr: Deformable transformers for end-to-end object detection. arXiv preprint arXiv:2010.04159. https://doi.org/10.48550/arXiv.2010.04159Google Scholar
- Mahyar Najibi, Mohammad Rastegari, and Larry S Davis. 2016. G-CNN: an iterative grid based object detector. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2369-2377. https://doi.org/10.1109/CVPR.2016.260Google ScholarCross Ref
- Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. 2020. End-to-End object detection with transformers. European conference on computer vision. Springer, Cham, 213-229. https://doi.org/10.48550/arXiv.2005.12872Google ScholarDigital Library
- Youngwan Lee, Joong-won Hwang, Sangrok Lee, Yuseok Bae, Jongyoul Park. 2019. An energy and GPU-computation efficient backbone network for real-time object detection. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 752-760. https://doi.org/10.1109/CVPRW.2019.00103Google ScholarCross Ref
- Jie Hu, Li Shen, Gang Sun. 2017. Squeeze-and-excitation networks. J. IEEE Transactions on Pattern Analysis and Machine Intelligence, (99). https://doi.org/10.1109/TPAMI.2019.2913372Google ScholarDigital Library
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. European Conference on Computer Vision. Springer International Publishing. https://doi.org/10.1007/978-3-319-10602-1_48Google Scholar
- Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun. 2018. Detnet: A backbone network for object detection. J. arXiv preprint arXiv:1804.06215. https://doi.org/10.48550/arXiv.1804.06215Google Scholar
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770-778. https://doi.org/10.1109/CVPR.2016.90Google ScholarCross Ref
- Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, Jiaya Jia. 2018. Path aggregation network for instance segmentation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8759-8768, https://doi.org/10.1109/CVPR.2018.00913Google ScholarCross Ref
- Gangming Zhao, Weifeng Ge, Yizhou Yu. 2021. GraphFPN: Graph Feature Pyramid Network for Object Detection. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, 2743-2752. https://doi.org/10.1109/ICCV48922.2021.00276Google Scholar
- Joseph Redmon, Ali Farhadi. 2018. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767. https://doi.org/10.48550/arXiv.1804.02767Google Scholar
- Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5987-5995, https://doi.org/10.1109/CVPR.2017.634Google ScholarCross Ref
Index Terms
- Improving the Detection Performance of Sparse R-CNN with Different Necks
Recommendations
Performance analysis of different DCNN models in remote sensing image object detection
AbstractIn recent years, deep learning, especially deep convolutional neural networks (DCNN), has made great progress. Many researchers use different DCNN models to detect remote sensing targets. Different DCNN models have different advantages and ...
Multiple Objects Detection based on Improved Faster R-CNN
ICSPS 2017: Proceedings of the 9th International Conference on Signal Processing SystemsObject detection is one of the hotspots in recent years. In order to solve those problems that many traditional methods exist such as single object detection and poor robustness detection, a multiple objects detection model based on the improved Faster ...
Multi-model ensemble with rich spatial information for object detection
Highlights- Ensemble learning improves the performance of object detection and achieves the mAP of state-of-the-art detectors.
AbstractDue to the development of deep learning networks and big data dimensionality, research on ensemble deep learning is receiving an increasing amount of attention. This paper takes the object detection task as the research domain and ...
Comments