Abstract
Due to the low detection accuracy of most anchor-free detectors and the slow detection speed of anchor-based detectors. Therefore, to balance the detection accuracy and speed of traffic scene objects, a new anchor-free detector called FABNet is proposed in this paper. The method is mainly composed of feature pyramid fusion module (FPFM), cascade attention module (CAM), and boundary feature extraction module (BFEM). Firstly, we design a feature pyramid fusion module to generate richer semantic information. The proposal of the feature pyramid fusion module not only improves the detection accuracy of objects, but also solves the problem of detection of objects of different sizes. Secondly, the cascade attention module achieves the local representation of features by exploiting hierarchical attention, spatial attention and channel attention. The proposal of cascade attention module improves the representation ability of object detection head. Finally, to obtain more foreground information under the influence of complex background, we design a boundary feature extraction module to extract the boundary features of the object effectively. We perform sufficient experiments on three public datasets, i.e., BDD100K, PASCAL VOC, and KITTI. The results show that our method achieves state-of-the-art levels in both accuracy and speed.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available from the corresponding author on reasonable request.
References
Bell S, Zitnick CL, Bala K, Girshick R (2016) Inside-outside net: detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2874–2883. https://doi.org/10.1109/CVPR.2016.314
Bochkovskiy A, Wang CY, Liao HYM (2020) YOLOv4: optimal speed and accuracy of object detection. https://doi.org/10.48550/arXiv.2004.10934
Cai Z, Vasconcelos N (2018) Cascade R-CNN: delving into high quality object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6154–6162. https://doi.org/10.1109/CVPR.2018.00644
Cai Z, Fan Q, Feris RS, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 354–370. https://doi.org/10.1007/978-3-319-46493-0_22
Cao J, Chen Q, Guo J, Shi R (2020) Attention-guided context feature pyramid network for object detection. https://doi.org/10.48550/arXiv.2005.11475
Chen X et al (2015) 3D object proposals for accurate object class detection. In: Proceedings of the advances in neural information processing systems, pp 424–432. https://doi.org/10.1016/j.patcog.2022.108796
Chen Y, Zhang Z, Cao Y, Wang L, Lin S, Hu H (2020) RepPoints V2: verification meets regression for object detection. https://doi.org/10.48550/arXiv.2007.08508
Cheng G, Si Y, Hong H, Yao X, Guo L (2021) Cross-scale feature fusion for object detection in optical remote sensing images. IEEE Geosci Remote Sens Lett:431–435. https://doi.org/10.1109/LGRS.2020.2975541
Cheng G, Yao Y, Li S, Li K, Xie X, Wang J, Yao X, Han J (2022) Dual-aligned oriented detector. IEEE Trans Geosci Remote Sens:1–11. https://doi.org/10.1109/TGRS.2022.3149780
Dai J, Li Y, He K, Sun J (2016) R-FCN: object detection via region-based fully convolutional networks. In: Advances in neural information processing systems, pp 379–387. https://doi.org/10.48550/arXiv.1605.06409
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The pascal visual object classes (voc) challenge. Int J Comput Vis 88:303–338. https://doi.org/10.1007/s11263-009-0275-4
Fan S, Zhu F, Chen S, Zhang H, Wang FY (2021) FII-CenterNet: an anchor-free detection with foreground attention for traffic object detection. IEEE Trans Veh Technol 70:121–132. https://doi.org/10.1109/TVT.2021.3049805
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg AC (2017) DSSD: Deconvolutional single shot detector. https://doi.org/10.48550/arXiv.1701.06659
Geiger A, Lenz P, Urtasun R (2012) Are we ready for autonomous driving? The Kitti vision benchmark suite. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 3354–3361. https://doi.org/10.1109/CVPR.2012.6248074
Gidaris S, Komodakis N (2015) Object detection via a multiregion and semantic segmentation-aware cnn model. In: Proceedings of the IEEE international conference on computer vision, pp 1134–1142. https://doi.org/10.48550/arXiv.1505.01749
Girshck R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE conference on computer vision and pattern recognition, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
Girshick R (2015) Fast R-CNN. In: Proceedings of the IEEE international conference on computer vision, pp 1440–1448. https://doi.org/10.1109/ICCV.2015.169
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.48550/arXiv.1512.03385
He K, Gkioxari G, Dollár P, Girshick R (2017) Mask R-CNN. In: proceedings of the IEEE international conference on computer vision, pp 2961–2969. https://doi.org/10.48550/arXiv.1703.06870
Hu X, Xu X, Xiao Y, Chen H, He S, Qin J (2018) SINet: a scale-insensitive convolutional neural network for fast vehicle detection. IEEE Trans Intell Transp Syst:1010–1019. https://doi.org/10.1109/TITS.2018.2838132
Kong T, Sun F, Yao A, Liu H, Lu M, Chen Y (2017) RON: reverse connection with objectness prior networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5936–5944. https://doi.org/10.1109/CVPR.2017.557
Kong T, Sun F, Liu H, Jiang Y, Shi J (2020) FoveaBox: beyond anchor-based object detector. IEEE Trans Image Process 29:7389–7398. https://doi.org/10.1109/TIP.2020.3002345
Lan S, Ren Z, Wu Y, Davis LS, Hua G (2020) SaccadeNet: a fast and accurate object detector. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 10394–10403. https://doi.org/10.1109/CVPR42600.2020.01041
Law H, Deng J (2018) CornerNet: detecting objects as paired keypoints. In: Proceedings of the European conference on computer vision (ECCV), pp 734–750. https://doi.org/10.1007/s11263-019-01204-1
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: IEEE/CVF international conference on computer vision (ICCV), pp 6053–6062. https://doi.org/10.1109/ICCV.2019.00615
Lim JS, Astrid M, Yoon HJ, Lee SL (2021) Small object detection using context and attention. In: 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), pp 181–186. https://doi.org/10.48550/arXiv.1912.06319
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: European conference on computer vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988. https://doi.org/10.1109/TPAMI.2018.2858826
Lin T-Y, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE conference on computer vision and pattern recognition, pp 2117–2125. https://doi.org/10.1109/CVPR.2017.106
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) SSD: single shot multibox detector. In: European conference on computer vision. Springer, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Liu Y, Wang R, Shan S, Chen X (2018) Structure inference net: object detection using scene-level context and instance-level relationships. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6985–6994. https://doi.org/10.1109/CVPR.2018.00730
Liu W, Liao S, Ren W, Hu W, Yu Y (2019) High-level semantic feature detection: a new perspective for pedestrian detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 5182–5191. https://doi.org/10.1109/CVPR.2019.00533
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440. https://doi.org/10.1109/TPAMI.2016.2572683
Lu X, Wang W, Ma C, Shen J , Shao L, Porikli F (2019) See more, know more: unsupervised video object segmentation with co-attention siamese networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3618–3627. https://doi.org/10.1109/cvpr.2019.00374
Lu X, Wang W, Shen J, Crandall D, Luo J (2020) Zero-shot video object segmentation with co-attention siamese networks. IEEE Trans Pattern Anal Mach Intell 44:2228–2242. https://doi.org/10.1109/TPAMI.2020.3040258
Lu X, Wang W, Shen J, Crandall D, Gool LV (2021) Segmenting objects from relational visual data. IEEE Trans Pattern Anal Mach Intell 44:7885–7897. https://doi.org/10.1109/TPAMI.2021.3115815
Mousavian A, Anguelov D, Flynn J, Kosecka J (2017) 3-D bounding box estimation using deep learning and geometry. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5632–5640. https://doi.org/10.1109/CVPR.2017.597
Pang J, Chen K, Shi J, Feng H, Ouyang W, Lin D (2019) Libra R-CNN: towards balanced learning for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 821–830. https://doi.org/10.1109/CVPR.2019.00091
Qiao S, Chen L C, Yuille A (2021 DetectoRS: detecting objects with recursive feature pyramid and switchable atrous convolution. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10208–10219. https://doi.org/10.1109/CVPR46437.2021.01008
Qiu H, Ma Y, Li Z, Liu S, Liu S, Sun J (2020) BorderDet: border feature for dense object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 549–564. https://doi.org/10.1007/978-3-030-58452-8_32
Redmon J, Farhadi A (2017) YOLO9000: Better, faster, stronger. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271. https://doi.org/10.1109/CVPR.2017.690
Redmon J, Farhadi A (2018) YOLOv3: An incremental improvement. https://doi.org/10.48550/arXiv.1804.02767
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: IEEE conference on computer vision and pattern recognition, pp 779–788. https://doi.org/10.1109/CVPR.2016.91
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards realtime object detection with region proposal networks. In: Advances in neural information processing systems, pp 91–99. https://doi.org/10.48550/arXiv.1506.01497
Samet N, Hicsonmez S, Akbas E (2020) HoughNet: integrating near and long-range evidence for bottom-up object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 406–423. https://doi.org/10.1007/978-3-030-58595-2_25
Simonyan K, Zisserman A (2014) Very deep connolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556
Tan M, Le QV (2019) EfficientNet: Rethinking model scaling for convolution neural networks. https://doi.org/10.48550/arXiv.1905.11946
Tan M, Pang R, Le QV (2020) EfficientDet: scalable and efficient object detection, In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10778–10787. https://doi.org/10.1109/CVPR42600.2020.01079
Tian Z, Shen C, Chen H, He T (2019) FCOS: fully convolutional one-stage object detection. In: IEEE/CVF international conference on computer vision (ICCV), pp 9626–9635. https://doi.org/10.1109/ICCV.2019.00972
Wang X, Chen K, Huang Z, Yao C, Liu W (2017) Point linking network for object detection. https://doi.org/10.48550/arXiv.1706.03646
Wei J, He J, Zhou Y, Chen K, Tang Z, Xiong Z (2020) Enhanced object detection with deep convolutional neural networks for advanced driving assistance. IEEE Trans Intell Transp Syst 21:1572–1583. https://doi.org/10.1109/TITS.2019.2910643
Xiang Y, Choi W, Lin Y, Savarese S (2017) Subcategory-aware convolutional neural networks for object proposals and detection. In: Proceedings of the IEEE winter conference on applications of computer vision, pp 924–933. https://doi.org/10.1109/WACV.2017.108
Yang D, Zou Y, Zhang J, Li G (2019) C-RPNs: promoting object detection in real world via a cascade structure of region proposal networks. Neurocomputing 367:20–30. https://doi.org/10.1016/j.neucom.2019.08.016
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) RepPoints: point set representation for object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp 9657–9666. https://doi.org/10.1109/ICCV.2019.00975
Yu F, Chen H, Wang X, Xian W, Chen Y, Liu F (2018) BDD100K: a diverse driving dataset for heterogeneous multitask learning. https://doi.org/10.48550/arXiv.1805.04687
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 9759–9768. https://doi.org/10.1109/CVPR42600.2020.00978
Zhou X, Zhuo J, Krahenbuhl P (2019) Bottom-up object detection by grouping extreme and center points. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 850–859. https://doi.org/10.48550/arXiv.1901.08043
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7263–7271. https://doi.org/10.48550/arXiv.1904.07850
Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H (2017) CoupleNet: coupling global structure with local parts for object detection. In: IEEE international conference on computer vision, pp 4126–4134. https://doi.org/10.48550/arXiv.1708.02863
Acknowledgments
This work was supported in part by NSFC (61572286 and 61472220), NSFC Joint with Zhejiang Integration of Informatization and Industrializaiton under Key Project (U1609218), and the Fostering Project of Dominant Discipline a Talent Team of Shandong Province Higher Education.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ding, T., Feng, K., Yan, Y. et al. An improved anchor-free method for traffic scene object detection. Multimed Tools Appl 82, 34703–34724 (2023). https://doi.org/10.1007/s11042-023-15077-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-15077-7