Abstract
Unmanned Aerial Vehicles (UAVs) are utilized instead of humans to complete aerial assignments in various fields. With the development of computer vision, object detection has become one of the core technologies in UAV application. However, object detection of small targets often has missed detection, and the detection performance is far less than that of large targets. In this paper, we propose a dual inspection mechanism, which identifies missed targets in suspicious areas to assist single-stage detection branches, and shares dual decisions to make feature-level multi-instance detection modules produce reliable results. Firstly, the detection results contain missed targets is confirmed, which are in the part that does not reach the confidence threshold. For this reason, the feature vector provided by the denoising sparse autoencoder is calculated, and this part of the result is filtered again. Secondly, we empirically reveal that single detection results are not reliable enough, and the multiple attributes of the target need to be considered. Motivated by this, the initial and secondary detection results are combined and rank by importance. Finally, we give the corresponding confidence to the top-ranked instance, making it possible to become the object again. Experimental results reflect that our mechanism improves 2.7% mAP on the VisDrone2020 dataset, 1.0% mAP on the UAVDT dataset and 1.8% mAP on the MS COCO dataset. We propose detection mechanism which achieves state-of-the-art levels on these datasets and it performs better on small object detection.





Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Sun G, Ding S, Sun T, Zhang C (2021) Sa-capsgan: Using capsule networks with embedded self-attention for generative adversarial network. Neurocomputing 423:399–406
Hsieh M-R, Lin Y-L, Hsu HW (2017) Drone-based object counting by spatially regularized regional proposal network. . In: IEEE International Conference on Computer Vision, pp 4165–4173
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8
Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38:142–158
Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 379–387
Xiao T, Li S, Wang B, Lin L, Wang X (2016) End-to-end deep learning for person search. In: IEEE Conference on Computer Vision and Pattern Recognition
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 1–17
Leng J, Liu Y (2018) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 13:1–10
Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–12
Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788
Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767
Bochkovskiy A, Wang C-Y, Liao H-Y (2020) Yolov4: Optimal speed and accuracy of object detection. pp 1–17. arXiv:1911.09070v4
Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 1–10
Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: Ieee conference on computer vision and pattern recognition, pp 10781–10790
Lei J, Chen Y, Bo P, Ling N, Hou C (2018) Multi-stream region proposal network for pedestrian detection. In: IEEE International Conference on Multimedia and Expo Workshops , pp 1–6
Cai Z, Fan Q, Feris R, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370
Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 42, pp 318–327
Bayar B, Stamm M (2018) Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Trans Inf Forensic Secur 13:2691–2706
Li T, Ding F, Yang W (2020) Uav object tracking by background cues and aberrances response suppression mechanism. Neural Comput Appl:1–15
Uysal M, Toprak AS, Polat N (2015) Dem generation with uav photogrammetry and accuracy analysis in sahitler hill. Measurement 73(9):539–543
Ge W, Yang S, Yu Y (2018) Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1277–1286
Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6526–6534
Conte G, Doherty P (2008) An integrated uav navigation system based on aerial image matching. In: IEEE Aerospace Conference Proceedings, pp 1–10
Laliberte A, Rango A (2009) Texture and scale in object-based analysis of subdecimeter resolution unmanned aerial vehicle (uav) imagery. IEEE Trans Geosci Remote Sens 47:761–770
Lu Y, Xue Z, Xia G-S, Zhang L (2018) A survey on vision-based uav navigation. Geo-spatial Inf Sci 21:1–12
Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944
Peterson L (2009) K-nearest neighbor. Scholarpedia 4:1883
Kong T, Sun F, Huang W, Liu H (2018) Deep feature pyramid reconfiguration for object detection. In: European Conference on Computer Vision, pp 8–14
Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162
He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1920
Girshick R (2015) Fast r-cnn. In: IEEE international conference on computer vision, pp 1440–1448
Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149
Ding X, Li Q, Cheng Y, Wang J, Bian W, Jie B (2020) Local keypoint-based faster r-cnn. Appl Intell 50:3007–3022
Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: A traffic surveillance multi-scale vehicle object detection method. Appl Intell 50:3125–3136
Dai X, Yuan X, Wei X (2020) Tirnet: Object detection in thermal infrared images for autonomous driving. Appl Intell:1–10
Ren Y, Zhu C, Xiao S (2018) Small object detection in optical remote sensing images via modified faster r-cnn. Appl Sci 2:1–11
Yi K, Jian Z, Chen S, Chen Y, Zheng N (2018) Knowledge-based recurrent attentive neural network for traffic sign detection 4:15–18. arXiv:1803.05263
Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768
Li Y, Chen Y, Wang N, Zhang Z-X (2019) Scale-aware trident networks for object detection. In: IEEE International Conference on Computer Vision, pp 6053–6062
Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 10781–10790
Liu Z, Gao G, Sun L, Fang Z (2020) Hrdnet: High-resolution detection network for small objects, pp 1–8. arXiv:2006.07607
Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8:587–595
Zhang J, Ding S, Zhang N, Shi Z (2016) Incremental extreme learning machine based on deep feature embedded. Int J Mach Learn Cybern 7:111–120
Meng L, Ding S, Xue Y (2016) Research on denoising sparse autoencoder. Int J Mach Learn Cybern 8:1719–1729
Zhu P, Wen L, Du D, Bian X, Hu Q, Ling H (2020) Vision meets drones: Past, present and future, pp 1–11. arXiv:2001.06303
Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking, pp 1–17
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P, Zitnick C (2014) Microsoft coco: Common objects in context. In: IEEE International Conference on Computer Vision, vol 8693, pp 740–755
Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–11
Yang F, Fan H, Chu P, Blasch E, Ling H (2019) Clustered object detection in aerial images. In: IEEE International Conference on Computer Vision, pp 1–10
Singh B, Najibi M, Davis L (2018) Sniper: Efficient multi-scale training. In: Conference on Neural Information Processing Systems, pp 1–11
Singh B, Davis L (2018) An analysis of scale invariance in object detection-snip. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–10
Zhang S, Wen L, Bian X, Lei Z, Li S (2020) Refinedet++: Single-shot refinement neural network for object detection. IEEE Trans Circ Sys Video Technol:1–10
Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision, pp 404–419
Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: European Conference on Computer Vision, pp 234–250
Wang T, Anwer R M, Cholakkal H, Khan F S, Pang Y, Shao L (2019) Learning rich features at high-speed for single-shot object detection. In: IEEE International Conference on Computer Vision, pp 1971–1980
Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European Conference on Computer Vision, pp 734–750
Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–9266
Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detectionv
Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 840–849
Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection, pp 1–8. arXiv:1911.09516
Duan K, Bai S, Xie L, Qi H, Tian Q (2019) Centernet: Object detection with keypoint triplets for object detection. In: IEEE International Conference on Computer Vision, pp 6569–6578
Zhu C, Chen F, Shen Z, Savvides M (2019) Soft anchor-point object detection, pp 1–9. arXiv:1911.12448
Zhang S, Chi C, Yao Y, Lei Z, Li S (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–10
Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp 764–773
Acknowledgments
This work was supported by the National Natural Science Foundation of China under Grant No. 61703196, the Natural Science Foundation of Fujian Province under Grant 2020J01821 and the Key Science Foundation of Zhangzhou City under Grant ZZ2019ZD11.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Tian, G., Liu, J., Zhao, H. et al. Small object detection via dual inspection mechanism for UAV visual images. Appl Intell 52, 4244–4257 (2022). https://doi.org/10.1007/s10489-021-02512-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02512-1