Skip to main content

Advertisement

Log in

Small object detection via dual inspection mechanism for UAV visual images

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Unmanned Aerial Vehicles (UAVs) are utilized instead of humans to complete aerial assignments in various fields. With the development of computer vision, object detection has become one of the core technologies in UAV application. However, object detection of small targets often has missed detection, and the detection performance is far less than that of large targets. In this paper, we propose a dual inspection mechanism, which identifies missed targets in suspicious areas to assist single-stage detection branches, and shares dual decisions to make feature-level multi-instance detection modules produce reliable results. Firstly, the detection results contain missed targets is confirmed, which are in the part that does not reach the confidence threshold. For this reason, the feature vector provided by the denoising sparse autoencoder is calculated, and this part of the result is filtered again. Secondly, we empirically reveal that single detection results are not reliable enough, and the multiple attributes of the target need to be considered. Motivated by this, the initial and secondary detection results are combined and rank by importance. Finally, we give the corresponding confidence to the top-ranked instance, making it possible to become the object again. Experimental results reflect that our mechanism improves 2.7% mAP on the VisDrone2020 dataset, 1.0% mAP on the UAVDT dataset and 1.8% mAP on the MS COCO dataset. We propose detection mechanism which achieves state-of-the-art levels on these datasets and it performs better on small object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Sun G, Ding S, Sun T, Zhang C (2021) Sa-capsgan: Using capsule networks with embedded self-attention for generative adversarial network. Neurocomputing 423:399–406

    Article  Google Scholar 

  2. Hsieh M-R, Lin Y-L, Hsu HW (2017) Drone-based object counting by spatially regularized regional proposal network. . In: IEEE International Conference on Computer Vision, pp 4165–4173

  3. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–8

  4. Girshick R, Donahue J, Darrell T, Malik J (2015) Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans Pattern Anal Mach Intell 38:142–158

    Article  Google Scholar 

  5. Dai J, Li Y, He K, Sun J (2016) R-fcn: Object detection via region-based fully convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 379–387

  6. Xiao T, Li S, Wang B, Lin L, Wang X (2016) End-to-end deep learning for person search. In: IEEE Conference on Computer Vision and Pattern Recognition

  7. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) Ssd: Single shot multibox detector. In: European Conference on Computer Vision, pp 1–17

  8. Leng J, Liu Y (2018) An enhanced ssd with feature fusion and visual reasoning for object detection. Neural Comput Appl 13:1–10

    Google Scholar 

  9. Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–12

  10. Redmon J, Divvala S, Girshick R, Farhadi A (2015) You only look once: Unified, real-time object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 779–788

  11. Redmon J, Farhadi A (2017) Yolo9000: Better, faster, stronger. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6517–6525

  12. Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv:1804.02767

  13. Bochkovskiy A, Wang C-Y, Liao H-Y (2020) Yolov4: Optimal speed and accuracy of object detection. pp 1–17. arXiv:1911.09070v4

  14. Tan M, Le Q (2019) Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning, pp 1–10

  15. Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: Ieee conference on computer vision and pattern recognition, pp 10781–10790

  16. Lei J, Chen Y, Bo P, Ling N, Hou C (2018) Multi-stream region proposal network for pedestrian detection. In: IEEE International Conference on Multimedia and Expo Workshops , pp 1–6

  17. Cai Z, Fan Q, Feris R, Vasconcelos N (2016) A unified multi-scale deep convolutional neural network for fast object detection. In: European Conference on Computer Vision, pp 354–370

  18. Lin T-Y, Goyal P, Girshick R, He K, Dollar P (2018) Focal loss for dense object detection. In: IEEE Transactions on Pattern Analysis and Machine Intelligence, vol 42, pp 318–327

  19. Bayar B, Stamm M (2018) Constrained convolutional neural networks: A new approach towards general purpose image manipulation detection. IEEE Trans Inf Forensic Secur 13:2691–2706

    Article  Google Scholar 

  20. Li T, Ding F, Yang W (2020) Uav object tracking by background cues and aberrances response suppression mechanism. Neural Comput Appl:1–15

  21. Uysal M, Toprak AS, Polat N (2015) Dem generation with uav photogrammetry and accuracy analysis in sahitler hill. Measurement 73(9):539–543

    Article  Google Scholar 

  22. Ge W, Yang S, Yu Y (2018) Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1277–1286

  23. Chen X, Ma H, Wan J, Li B, Xia T (2017) Multi-view 3d object detection network for autonomous driving. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6526–6534

  24. Conte G, Doherty P (2008) An integrated uav navigation system based on aerial image matching. In: IEEE Aerospace Conference Proceedings, pp 1–10

  25. Laliberte A, Rango A (2009) Texture and scale in object-based analysis of subdecimeter resolution unmanned aerial vehicle (uav) imagery. IEEE Trans Geosci Remote Sens 47:761–770

    Article  Google Scholar 

  26. Lu Y, Xue Z, Xia G-S, Zhang L (2018) A survey on vision-based uav navigation. Geo-spatial Inf Sci 21:1–12

    Article  Google Scholar 

  27. Lin T-Y, Dollar P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 936–944

  28. Peterson L (2009) K-nearest neighbor. Scholarpedia 4:1883

    Article  Google Scholar 

  29. Kong T, Sun F, Huang W, Liu H (2018) Deep feature pyramid reconfiguration for object detection. In: European Conference on Computer Vision, pp 8–14

  30. Cai Z, Vasconcelos N (2018) Cascade r-cnn: Delving into high quality object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 6154–6162

  31. He K, Zhang X, Ren S, Sun J (2015) Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans Pattern Anal Mach Intell 37:1904–1920

    Article  Google Scholar 

  32. Girshick R (2015) Fast r-cnn. In: IEEE international conference on computer vision, pp 1440–1448

  33. Ren S, He K, Girshick R, Sun J (2017) Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Trans Pattern Anal Mach Intell 39:1137–1149

    Article  Google Scholar 

  34. Ding X, Li Q, Cheng Y, Wang J, Bian W, Jie B (2020) Local keypoint-based faster r-cnn. Appl Intell 50:3007–3022

    Article  Google Scholar 

  35. Mao Q-C, Sun H-M, Zuo L-Q, Jia R-S (2020) Finding every car: A traffic surveillance multi-scale vehicle object detection method. Appl Intell 50:3125–3136

    Article  Google Scholar 

  36. Dai X, Yuan X, Wei X (2020) Tirnet: Object detection in thermal infrared images for autonomous driving. Appl Intell:1–10

  37. Ren Y, Zhu C, Xiao S (2018) Small object detection in optical remote sensing images via modified faster r-cnn. Appl Sci 2:1–11

    Google Scholar 

  38. Yi K, Jian Z, Chen S, Chen Y, Zheng N (2018) Knowledge-based recurrent attentive neural network for traffic sign detection 4:15–18. arXiv:1803.05263

  39. Liu S, Qi L, Qin H, Shi J, Jia J (2018) Path aggregation network for instance segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 8759–8768

  40. Li Y, Chen Y, Wang N, Zhang Z-X (2019) Scale-aware trident networks for object detection. In: IEEE International Conference on Computer Vision, pp 6053–6062

  41. Tan M, Pang R, Le Q (2020) Efficientdet: Scalable and efficient object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 10781–10790

  42. Liu Z, Gao G, Sun L, Fang Z (2020) Hrdnet: High-resolution detection network for small objects, pp 1–8. arXiv:2006.07607

  43. Ding S, Zhang N, Zhang J, Xu X, Shi Z (2017) Unsupervised extreme learning machine with representational features. Int J Mach Learn Cybern 8:587–595

    Article  Google Scholar 

  44. Zhang J, Ding S, Zhang N, Shi Z (2016) Incremental extreme learning machine based on deep feature embedded. Int J Mach Learn Cybern 7:111–120

    Article  Google Scholar 

  45. Meng L, Ding S, Xue Y (2016) Research on denoising sparse autoencoder. Int J Mach Learn Cybern 8:1719–1729

    Article  Google Scholar 

  46. Zhu P, Wen L, Du D, Bian X, Hu Q, Ling H (2020) Vision meets drones: Past, present and future, pp 1–11. arXiv:2001.06303

  47. Du D, Qi Y, Yu H, Yang Y, Duan K, Li G, Zhang W, Tian Q (2018) The unmanned aerial vehicle benchmark: Object detection and tracking, pp 1–17

  48. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollr P, Zitnick C (2014) Microsoft coco: Common objects in context. In: IEEE International Conference on Computer Vision, vol 8693, pp 740–755

  49. Fu C-Y, Liu W, Ranga A, Tyagi A, Berg A C (2017) Dssd: Deconvolutional single shot detector. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–11

  50. Yang F, Fan H, Chu P, Blasch E, Ling H (2019) Clustered object detection in aerial images. In: IEEE International Conference on Computer Vision, pp 1–10

  51. Singh B, Najibi M, Davis L (2018) Sniper: Efficient multi-scale training. In: Conference on Neural Information Processing Systems, pp 1–11

  52. Singh B, Davis L (2018) An analysis of scale invariance in object detection-snip. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–10

  53. Zhang S, Wen L, Bian X, Lei Z, Li S (2020) Refinedet++: Single-shot refinement neural network for object detection. IEEE Trans Circ Sys Video Technol:1–10

  54. Liu S, Huang D, Wang Y (2018) Receptive field block net for accurate and fast object detection. In: European Conference on Computer Vision, pp 404–419

  55. Kim S-W, Kook H-K, Sun J-Y, Kang M-C, Ko S-J (2018) Parallel feature pyramid network for object detection. In: European Conference on Computer Vision, pp 234–250

  56. Wang T, Anwer R M, Cholakkal H, Khan F S, Pang Y, Shao L (2019) Learning rich features at high-speed for single-shot object detection. In: IEEE International Conference on Computer Vision, pp 1971–1980

  57. Law H, Deng J (2018) Cornernet: Detecting objects as paired keypoints. In: European Conference on Computer Vision, pp 734–750

  58. Zhao Q, Sheng T, Wang Y, Tang Z, Chen Y, Cai L, Ling H (2019) M2det: A single-shot object detector based on multi-level feature pyramid network. Proc AAAI Conf Artif Intell 33:9259–9266

    Google Scholar 

  59. Tian Z, Shen C, Chen H, He T (2019) Fcos: Fully convolutional one-stage object detectionv

  60. Zhu C, He Y, Savvides M (2019) Feature selective anchor-free module for single-shot object detection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 840–849

  61. Liu S, Huang D, Wang Y (2019) Learning spatial fusion for single-shot object detection, pp 1–8. arXiv:1911.09516

  62. Duan K, Bai S, Xie L, Qi H, Tian Q (2019) Centernet: Object detection with keypoint triplets for object detection. In: IEEE International Conference on Computer Vision, pp 6569–6578

  63. Zhu C, Chen F, Shen Z, Savvides M (2019) Soft anchor-point object detection, pp 1–9. arXiv:1911.12448

  64. Zhang S, Chi C, Yao Y, Lei Z, Li S (2020) Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: IEEE Conference on Computer Vision and Pattern Recognition, pp 1–10

  65. Dai J, Qi H, Xiong Y, Li Y, Zhang G, Hu H, Wei Y (2017) Deformable convolutional networks. In: IEEE International Conference on Computer Vision, pp 764–773

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China under Grant No. 61703196, the Natural Science Foundation of Fujian Province under Grant 2020J01821 and the Key Science Foundation of Zhangzhou City under Grant ZZ2019ZD11.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenyuan Yang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tian, G., Liu, J., Zhao, H. et al. Small object detection via dual inspection mechanism for UAV visual images. Appl Intell 52, 4244–4257 (2022). https://doi.org/10.1007/s10489-021-02512-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02512-1

Keywords