skip to main content
10.1145/3446999.3447023acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicitConference Proceedingsconference-collections
research-article
Open access

Multi-scale Feature Fusion UAV Image Object Detection Method Based on Dilated Convolution and Attention Mechanism

Published: 09 April 2021 Publication History

Abstract

Due to the influence of the shooting angle of view and the flight height, the images taken by UAV often have complex backgrounds and contain a large number of small and unevenly distributed objects. In order to solve the problem that it is difficult to accurately locate and recognize small objects in UAV images under complex backgrounds, this paper proposes an multi-scale feature fusion algorithm D-A-FS SSD (Dilated-Attention-Feature Fusion SSD) based on the combination of dilated convolution and attention mechanism. In the process of feature extraction, the receptive field of the feature is expanded through the dilated convolution, which improves the network's feature expression of object distribution and scale difference information. And a attention network is used in our method to effectively suppresse the background information. In the multi-scale detection stage, our method fuses the low-level feature map responsible for detecting small objects with the high-level feature map which have much higher semantic information to improve the recognition accuracy of small objects. Experimental results show that our method effectively improves the accuracy of UAV image object detection.

References

[1]
Colomina I, Molina P . Unmanned aerial systems for photogrammetry and remote sensing: A review - ScienceDirect[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2014, 92(2):79-97.
[2]
Gan L, Liu P, Wang L. Rotation sliding window of the hog feature in remote sensing images for ship detection[C]//In P roceedings of the 8th International Symposium on Computational Intelligence and Design, Hangzhou, China, 12–13 December 2015:401-404.
[3]
Sun H, Sun X, Wang H, Automatic target detection in high-resolution remote sensing images using spatial sparse coding bag-of-words mode[J]. IEEE Geoscience and Remote Sensing Letter, 2012: 109-113.
[4]
Wen X, Shao L, Fang W, Efficient Feature Selection and Classification for Vehicle Detection[J]. IEEE Transactions on Circuits & Systems for Video Technology, 2015, 25(3):508-517.
[5]
Zhang Difei, Zhang Jinsuo, Yao Keming Infrared ship-target recognition based on SVM classification. Infrared and Laser Engineering, 2016, 45(1): 0104004.
[6]
Russakovsky O, Deng J, Su H, ImageNet large scale visual recongnition challenge[J]. International Journal of Computer Vision, 2015.
[7]
Girshick R, Donahue J, Darrell T, Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation[J]. 2013.
[8]
Ren S, He K, Girshick R, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[9]
Ren S, He K, Girshick R, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, 39(6):1137-1149.
[10]
Dai, Jifeng “R-FCN: Object Detection via Region-based Fully Convolutional Networks.” ArXiv abs/1605.06409 (2016): n. pag.
[11]
Redmon, Joseph “You Only Look Once: Unified, Real-Time Object Detection.” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016): 779-788.
[12]
Liu, W. “SSD: Single Shot MultiBox Detector.” ECCV (2016).
[13]
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
[14]
Jiang Bo, Qu Rukun, Li Yandong, Li Chenglong.A Review of deep Learning based Uav aerial target Detection [J/OL]. Acta Aeronautica sinica :1-17[2020-10-10].
[15]
Aguilar W G,Quisaguano F J,Rodríguez G A,et al. Convolutional Neuronal Networks Based Monocular Object Detection and Depth Perception for Micro UAVs[C]. International Conference on Intelligent Science and Big Data Engineering, Xiamen,China,2018.
[16]
LI Bin, ZHANG Caixia, YANG Yang, Drone target detection algorithm for depth representation in complex scenes.Computer Engineering and Applications, 2020: 1-7.
[17]
Yu F, Koltun V. Multi‐Scale Context Aggregation by Dilated Convolutions [C]. The International Con‐ference on Learning Representations, San Juan, Puerto Rico, 2016.
[18]
Qu Changbo,Jiang Siyao,Wu Deyang. Multiscale Semantic Segmentation Network Based on Cavity Convolution [J] . Computer Engineering and Appli⁃cations,2019,55(24):91‐95.
[19]
Du Dawei,Zhu Pengfei,Wen Longyin,et al. Vis‐Drone‐DET2019:The Vision Meets Drone Object Detection in Image Challenge Results [C. IEEE International Conference on Computer Vision (VisDrone Workshop), Seoul Korea, 201.
[20]
IOFFES,SZEGEDYC.Batch normalization:accelerating deep network training by reducing internal covariate shift[J]//International Conference on Machine Learning,2015,1:448-456.
[21]
Lin T Y, Goyal P, Girshick R, Focal Loss for Dense Object Detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2017, PP(99):2999-3007.

Cited By

View all
  • (2025)Classification of Flying Drones Using Millimeter-Wave Radar: Comparative Analysis of Algorithms Under Noisy ConditionsSensors10.3390/s2503072125:3(721)Online publication date: 24-Jan-2025
  • (2024)MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle ImageryRemote Sensing10.3390/rs1617316916:17(3169)Online publication date: 27-Aug-2024
  • (2024)Research on facial expression recognition algorithm based on improved MobileNetV3EURASIP Journal on Image and Video Processing10.1186/s13640-024-00638-z2024:1Online publication date: 22-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICIT '20: Proceedings of the 2020 8th International Conference on Information Technology: IoT and Smart City
December 2020
266 pages
ISBN:9781450388559
DOI:10.1145/3446999
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 April 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Attention mechanism
  2. Dilated convolution
  3. Feature fusion
  4. Multi-scale detection

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICIT 2020
ICIT 2020: IoT and Smart City
December 25 - 27, 2020
Xi'an, China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)269
  • Downloads (Last 6 weeks)34
Reflects downloads up to 03 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Classification of Flying Drones Using Millimeter-Wave Radar: Comparative Analysis of Algorithms Under Noisy ConditionsSensors10.3390/s2503072125:3(721)Online publication date: 24-Jan-2025
  • (2024)MCG-RTDETR: Multi-Convolution and Context-Guided Network with Cascaded Group Attention for Object Detection in Unmanned Aerial Vehicle ImageryRemote Sensing10.3390/rs1617316916:17(3169)Online publication date: 27-Aug-2024
  • (2024)Research on facial expression recognition algorithm based on improved MobileNetV3EURASIP Journal on Image and Video Processing10.1186/s13640-024-00638-z2024:1Online publication date: 22-Aug-2024
  • (2024)LENet: Lightweight and Effective Detector for Aerial ObjectUnmanned Systems10.1142/S230138502550038412:06(1105-1121)Online publication date: 8-May-2024
  • (2023)A Traffic Parameter Extraction Model Using Small Vehicle Detection and Tracking in Low-Brightness Aerial ImagesSustainability10.3390/su1511850515:11(8505)Online publication date: 24-May-2023
  • (2023)Improved YOLOX-X based UAV aerial photography object detection algorithmImage and Vision Computing10.1016/j.imavis.2023.104697135(104697)Online publication date: Jul-2023
  • (2023)SIRN: An iterative reasoning network for transmission lines based on scene prior knowledgeEngineering Applications of Artificial Intelligence10.1016/j.engappai.2023.106656125(106656)Online publication date: Oct-2023
  • (2022)Deep Learning for Unmanned Aerial Vehicle-Based Object Detection and Tracking: A surveyIEEE Geoscience and Remote Sensing Magazine10.1109/MGRS.2021.311513710:1(91-124)Online publication date: Mar-2022
  • (2022)A Comprehensive Review for Typical Applications Based Upon Unmanned Aerial Vehicle PlatformIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2022.321656415(9654-9666)Online publication date: 2022
  • (2022)Improved YOLOv5-based UAV Target Detection under Complex Background2022 2nd International Conference on Computational Modeling, Simulation and Data Analysis (CMSDA)10.1109/CMSDA58069.2022.00046(217-221)Online publication date: 2-Dec-2022
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media