A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs

doi:10.1016/j.jvcir.2021.103058

Journal of Visual Communication and Image Representation

Volume 77, May 2021, 103058

https://doi.org/10.1016/j.jvcir.2021.103058 Get rights and content

Highlights

•
The Multi-scale detection, detecting objects of various scales in aerial images.
•
The multi-scale fusion of the feature maps obtained by attention, enhance details.
•
Redistribute the weight of the loss to balance the accuracy between categories.
•
Use pruning technology to compress models to accelerate detection efficiency.

Abstract

Detecting the objects of interesting from aerial images captured by UAVs is one of the core modules in the UAV-based applications. However, it is very difficult to detection objects from aerial images. The reason is that the scale of objects in the aerial images captured by UAVs varies greatly and needs to meet certain real-time performance in detection. To deal with these challenges, we proposed a lightweight model named DSYolov3. We made the following improvements to the Yolov3 model: 1) multiple scale-aware decision discrimination network to detect objects in different scales, 2) a multi-scale fusion-based channel attention model to exploit the channel-wise information complementation, 3) a sparsity-based channel pruning to compress the model. Extensive experimental evaluation has demonstrated the effectiveness and efficiency of our approach. By the proposed approach, we could not only achieve better performance than most existing detectors but also ensure the models practicable on the UAVs.

Introduction

Unmanned Aerial Vehicle(UAV) is an aircraft that carries no human pilot or passengers. It is guided autonomously, or by remote control [1]. Equipped with high definition camera, UAVs can be remarkably efficient on surveillance tasks. Compared with surveillance cameras with fixed position and angle, the camera assembled in UAVs has its own unique advantages, such as varying shooting angles and positions. Due to the particularities of UAVs cameras, UAVs have been widely used in many social applications, such as object detection [2], target tracking [3], and path planning [4], etc.

Detecting the objects of interesting from images captured by UAVs is one of the core modules in the UAV-based applications, which has received increasing attention from multiple research communities in recent years. In research, visual object detection to determine the location of a target (e.g., pedestrians or vehicles) in a natural image, which has been extensively studied in computer vision in past years, and is also becoming increasing helpful in satellite and aviation monitoring [5], smart city construction [6], and construction of ecological civilization [7]. The UAV is an ideal platform for object detection since the aerial imagery system in UAVs can capture the scene in the view from top. However, compared with object detection in ordinary images, there is a significant difference to detect objects from the aerial image taken in UAVs: firstly, the angles of objects in aerial images are mostly viewed from the top and oblique, and the objects in aerial image are more difficult to recognize; secondly, though the size of the object changes with the shooting angle, the researchers are gratified that due to the shooting angle of view, there are fewer objects that overlap each other, so there is no need to pay too much attention to the overlap objects. Furthermore, traditional machine learning detection algorithms for UAVs [8] can not obtain good detection performance because the hand-crafted features are not very robust to changes in diversity. With the rapid development of deep learning frameworks, visual object detection has achieved great success for general object detection [9], [10]. Despite the great progress of generic object detection, they usually are just suitable for images captured from ground-based cameras and could not be directly transferred to UAV images. It is mainly due to two facts:

•
To fulfill the detection task, UAVs are required to process the captured data efficiently. However, the traditional deep object detectors, such as Faster-RCNN [11], usually incur high computational cost since large numbers of convolutional operations are involved during the detection. And the lightweight and flexibility of UAVs limits its ability to carry sufficient resource to perform computing-intensive tasks online. Although some lightweight object detectors, such as SSD [9] and Yolov3 [10] could run fast and achieve promising performance, they are mainly designed for generic objects, not ideal for UAV images.
•
Traditional deep object detectors usually require high-resolution images as input for high detection accuracy. However, the images captured by UAVs have varying object scales and usually contain many very small objects with low resolution, which are very difficult to detect. In order to cope with this problem, some improved algorithms are proposed for small object detection, such as SNIP [12] and Cascade-RCNN [13]. These algorithms have greatly improved the detection accuracy of small targets, but most of them rely on RPN [11] network and pyramid structure, which results in slow inference speed and the weak versatility of models. They are more suitable for offline detection tasks with low real-time requirement. Recently, Zhang et al. proposed a SlimYolov3 [14] detector which can perform the detection in real-time and is suitable for deploying on UAVs but the accuracy needs to be improved.

Despite some pioneering efforts for the object detection in UAV images, it still remains a under-explored task due to the above-mentioned challenges. To fill the research gap, this paper contributes a deep convolutional network-based fast object detector, named Deep Slight Yolov3 (DSYolov3) based on the classical Yolov3 [10] framework, to make the widely used Yolo series models be better applied to the task of detecting objects of different proportions in the images captured by the UAVs. Compared with the original Yolov3 framework, we propose a network structure with two additional components to make the model work better for scale-varying object detection. First, we propose to utilize multiple detection headers connected to different layers in the backbone network to detect objects at different scale. Then, we design a Multi-scale Fusion of Channel Attention Model (MFCAM) to exploit the channel-wise information complementation. Besides, a practical object detector for the UAVs has strict limits on the model parameters for high efficiency. To reduce the model redundancy in the manually-designed object detectors [15], [16], we design a simple and effective model pruning strategy to compress the proposed DSYolov3 framework by discarding the unimportant components without significantly affecting the performance for running efficiently in UAVs applications.

Compared to most existing small object detectors, our method is a one-stage detector and could obtain real-time performance while achieving better accuracy. After model pruning, the models are suitable for deploying on the UAVs. Our contributions are summarized as follows.

•
To fully combine the feature from different layers of a CNN backbone and detect object with different sizes, we optimize the Yolov3 model by modifying original three-level decision discrimination network to five-level. Detecting objects at different scales can locate small objects and classify them more accurately.
•
We propose a module named MFCAM (Multi-scale Fusion of Channel Attention Model) which combine the channel-wise attention mechanism and spatial pyramid pooling, to fuse the channel-wise feature at different scales for the scale-varying object detection tasks.
•
We adopt a simple model pruning strategy [15], [16] to balance the accuracy and the computational complexity by discarding the unimportant channels in the model, so that the model could be able to run efficiently in UAVs applications.

Extensive experimental evaluation on VisDrone2018-Det benchmark dataset and UAVDT dataset shows that it could achieve promising performance in small object detection task and the pruned model is applicable in UAVs applications. The rest of this paper is organized as follows: Section 2 briefly reviews the related works on object detection algorithms in deep learning, attention mechanisms and model compression algorithms. In Section 3, we introduce our proposed framework and adopted approaches, followed by the extensive experimental details in Section 4. Finally we analyze our approach in Section 5 and conclude the work in Section 6.

Section snippets

Object detection in deep learning

Deep learning solve a problem from raw data in an end-to-end manner. Since AlexNet [17] was proposed and achieved a remarkable classification accuracy on the Large Scale Visual Identity Challenge (ILSRVC) [18], researchers in computer vision are gradually focusing their research on deep learning. In 2014, R-CNN introduced deep learning methods into object detection, which improved the detection solution substantially and began to reshape the object detection approaches. R-CNN adopts selective

The proposed approach

In this paper, we aim to deploy a deep learning model to detect small objects in the UAVs. As shown in Fig. 4, we develop a deep Yolov3 framework for addressing the challenge of the original Yolov3 obtains unsatisfactory accuracy when detecting pictures captured by UAVs.

The overall architecture of the proposed DSYolov3 is shown in Fig. 4. In the method, we use Darknet [10] as the backbone network and design a Multi-scale Fusion of Channel Attention Model (MFCAM) to extract low-level location

Datasets

In this section, we evaluate the proposed DSYolov3 model on the VisDrone2018-Det [28] dataset and UAVDT [57] dataset in extensive experiments.

Discussion

The goal of our method is to upgrade the Yolo model, compress the model volume so that it can be deployed on the UAVs and complete the target detection task quickly and accurately.

Although there are a certain amount of normal size objects, most objects in the image captured by UAVs are small, and difficult to identify due to lack of sufficient features. We adopt a five-level decision discrimination network instead of three-level in original Yolov3, this network is designed to detect objects of

Conclusion

In this paper, we address the problem of object detection in aerial images captured by UAVs and propose a model to detect object with small size more accurately and efficiently. The proposed model which is called DSYolov3 is derived from the original Yolov3. To make it work well on small object detection, we introduce two components to fully exploit the multi-scale global information in the feature channel dimension and the fused features thoroughly. First, we propose a network structure with a

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the National Key R&D Program of China under grant 2018AAA0102002, and in part by the National Natural Science Foundation of China (NSFC) under grants 61976076, 61632007, 61932009 and 61806066.

References (61)

Arthur P. Cracknell
Uavs: regulations and law enforcement
Int. J. Remote Sens.
(2017)
Xindi Zhang et al.
Dense and small object detection in uav vision based on cascade network
Chunhui Zhang et al.
Accurate uav tracking with distance-injected overlap maximization
Vincent Roberge et al.
Comparison of parallel genetic algorithm and particle swarm optimization for real-time uav path planning
IEEE Trans. Ind. Informat.
(2012)
Qingpeng Li, Lichao Mou, Qizhi Xu, Yun Zhang, Xiao Xiang Zhu, R3-net: A deep network for multi-oriented vehicle...
Xiaofei Yang et al.
Road detection and centerline extraction via deep recurrent convolutional neural network u-net
IEEE Trans. Geosci. Remote Sens.
(2019)
Benjamin Kellenberger et al.
Half a percent of labels is enough: Efficient animal detection in uav imagery using deep cnns and active learning
IEEE Trans. Geosci. Remote Sens.
(2019)
Yakoub Bazi et al.
Convolutional svm networks for object detection in uav imagery
Ieee Trans. Geosci. Remote Sens.
(2018)
Wei Liu et al.
Ssd: Single shot multibox detector
Joseph Redmon, Ali Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767,...

Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster r-cnn: Towards real-time object detection with region...

Bharat Singh et al.

An analysis of scale invariance in object detection snip

Zhaowei Cai et al.

Cascade r-cnn: Delving into high quality object detection

Pengyi Zhang, Yunxin Zhong, Xiaoqiong Li, Slimyolov3: Narrower, faster and better for real-time uav applications, arXiv...

Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang, Rethinking the smaller-norm-less-informative assumption in channel pruning...

Zhuang Liu et al.

Learning efficient convolutional networks through network slimming

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks,...

Olga Russakovsky et al.

Imagenet large scale visual recognition challenge

Int. J. Comput. Vision

(2015)

Ross Girshick et al.

Rich feature hierarchies for accurate object detection and semantic segmentation

Kaiming He et al.

Mask r-cnn

Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun, Megdet: A large mini-batch...

Tsung-Yi Lin et al.

Focal loss for dense object detection

Shifeng Zhang et al.

Single-shot refinement neural network for object detection

Qijie Zhao et al.

M2det: A single-shot object detector based on multi-level feature pyramid network

Proceedings of the AAAI Conference on Artificial Intelligence

(2019)

Ruchan Dong et al.

Sig-nms-based faster r-cnn combining transfer learning for small target detection in vhr optical remote sensing imagery

IEEE Trans. Geosci. Remote Sens.

(2019)

Zheng Yang et al.

Detecting small objects in urban settings using slimnet model

IEEE Trans. Geosci. Remote Sens.

(2019)

Qi Wang et al.

Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes

IEEE Trans. Image Process.

(2019)

Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Qinqin Nie, Hao Cheng, Chenfeng Liu, Xiaoyu...

Fan Yang et al.

Clustered object detection in aerial images

Ren Jin et al.

Adaptive anchor for fast object detection in aerial image

IEEE Geosci. Remote Sens. Lett.

(2019)

Cited by (25)

Discriminative features enhancement for low-altitude UAV object detection
2024, Pattern Recognition
Object detection is a pivotal task in low-altitude UAV application. Here the small scale objects are dominant due to shooting distance and angle and insufficient feature information due to the data from real world scenes. Although general detector has made great progress, it is not suitable for small scale object detection directly. Dense detector has potential because of the pixel-by-pixel detection but the resolving power of complex background and objects especially small scale objects is still insufficient. We propose a Feature Guided Enhancement module by designing two non-linear learning operators to guide more discriminative features when training. Further, a Scale-Aware Weighted loss function is proposed to dynamically weight the loss of various scale objects by statistical computing and highlight the contribution of small scale objects. Experimental results show that our method can effectively improve FCOS and ATSS, and our models obtain better performance by 1.5% and 0.6% AP respectively on VisDrone 2018 dataset.
Fine-tuned YOLOv5 for real-time vehicle detection in UAV imagery: Architectural improvements and performance boost
2023, Expert Systems with Applications
Nowadays, Unmanned Aerial Vehicles (UAVs) have become useful for various civil applications, such as traffic monitoring and smart parkings, where real-time vehicle detection and classification is one of the key tasks. There are many challenges in detecting vehicles including small size objects and the variety in the UAV’s altitude and angle. As classic object detection solutions have limitations in confronting these challenges, recent methods are developed based on convolutional neural networks and their ability in effective feature learning. Due to the computational complexity in these networks and the need for accurate and real-time object detection, balancing the accuracy and inference speed is obligatory for efficiency. This paper aims to propose an accurate, efficient and real-time vehicle detection network based on the successful YOLOv5 object detection model. This is done by improving the structure of the model, adding attention mechanism and using an adaptive bounding box regression loss function. Also, considering the need for real-time inference speed, the depth and width of the model was balanced and ghost convolution was incorporated into the Neck unit to further improve the balance between accuracy and inference speed. The proposed method is evaluated on three different urban UAV imagery datasets, VisDrone, CARPK and VAID, specifically intended for civil applications. Comparing the obtained results from the proposed method with YOLOv5 baseline models, it achieved 3.52% higher $m A P 50$ and 207.15% higher FPS than YOLOv5X on VisDrone dataset, while it is much smaller in size and GFLOPS. Totally, the proposed network outcomes show how the applied structural and conceptual modifications can upgrade the YOLO family towards being small in size, high in accuracy and fast in inference speed.
Deep learning based object detection for resource constrained devices: Systematic review, future trends and challenges ahead
2023, Neurocomputing
Deep learning models are widely being employed for object detection due to their high performance. However, the majority of applications that require object detection are functioning on resource-constrained edge devices. In the present era, there is a need for deep learning-based object detectors that are lightweight and perform well on these constrained edge devices.
Objective: The research aims to identify current trends in resource-constrained applications for deep learning-based object detectors in terms of the technique used to create the model, the type of input image involved, the type of device used, and the type of application addressed by the model.
Method: To achieve the objective of our research, a systematic literature review was carried out that yielded 167 studies. The models or techniques employed in the studies were grouped to better understand the research problem at hand. This review carefully reports every decision and provides many visualizations of the final studies in order to draw clear conclusions.
Conclusion: The conclusion discussed the gaps, possibilities, and future perspectives discovered throughout the research process, implying that this field of study has grown profoundly in the last decade.
Automated object detection on aerial images for limited capacity embedded device using a lightweight CNN model
2022, Alexandria Engineering Journal
Citation Excerpt :
At an input size of 832x832 and pruning ratio of 95%, the SlimYOLOv3-SPP achieved mAP of 21.2%. The pruning strategy was also used to compress the YOLOv3 model with additional improvements, including five-level detection layers and a multi-scale fusion-based channel attention model that combines the channel-wise attention mechanism and spatial pyramid pooling [45]. The DSYOLOv3 achieved mAP of 11.6% at a pruning ratio of 95% and an input size of 416x416.
With the growing demand for geospatial data, challenging aerial images with high spatial, spectral, and temporal resolution achieve excellent development. Currently, deep Convolutional Neural Network (CNN) structures are applied widely for object detection. Nevertheless, existing deep CNN-based models consist of complex network structures and require immense amounts of graphics processing unit (GPU) computation power with high energy consumption. Thus, achieving efficient real-time object detection for limited memory and processing capacity embedded device is a major challenge. This paper proposes a feasible and lightweight object detection model based on deep CNN where a mobile inverted bottleneck module is adopted in the backbone structure. Moreover, an enhanced spatial pyramid pooling is adopted to increase the receptive field in the network by concatenating the multi-scale local region features. The experimental results demonstrated that the proposed model achieved higher average precision and required the smallest memory storage compared to previous works. Moreover, the proposed model offers the best trade-offs in terms of detection accuracy, model size, and detection time which has excellent potential to be deployed on limited capacity embedded device.
Improved YOLOv7 Target Detection Algorithm Based on UAV Aerial Photography
2024, Drones
MARITIME TARGET DETECTION FOR UNMANNED SURFACE VEHICLESBASED ON LIGHTWEIGHT NETWORKS UNDER FOGGY WEATHER
2024, International Journal of Robotics and Automation

View all citing articles on Scopus

^☆: This paper has been recommended for acceptance by Zicheng Liu.

View full text

A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs☆

Highlights

Abstract

Introduction

Section snippets

Object detection in deep learning

The proposed approach

Datasets

Discussion

Conclusion

Declaration of Competing Interest

Acknowledgement

Uavs: regulations and law enforcement

Int. J. Remote Sens.

Dense and small object detection in uav vision based on cascade network

Accurate uav tracking with distance-injected overlap maximization

Comparison of parallel genetic algorithm and particle swarm optimization for real-time uav path planning

IEEE Trans. Ind. Informat.

Road detection and centerline extraction via deep recurrent convolutional neural network u-net

IEEE Trans. Geosci. Remote Sens.

Half a percent of labels is enough: Efficient animal detection in uav imagery using deep cnns and active learning

IEEE Trans. Geosci. Remote Sens.

Convolutional svm networks for object detection in uav imagery

Ieee Trans. Geosci. Remote Sens.

Ssd: Single shot multibox detector

An analysis of scale invariance in object detection snip

Cascade r-cnn: Delving into high quality object detection

Learning efficient convolutional networks through network slimming

Imagenet large scale visual recognition challenge

Int. J. Comput. Vision

Rich feature hierarchies for accurate object detection and semantic segmentation

Mask r-cnn

Focal loss for dense object detection

Single-shot refinement neural network for object detection

M2det: A single-shot object detector based on multi-level feature pyramid network

Proceedings of the AAAI Conference on Artificial Intelligence

Sig-nms-based faster r-cnn combining transfer learning for small target detection in vhr optical remote sensing imagery

IEEE Trans. Geosci. Remote Sens.

Detecting small objects in urban settings using slimnet model

IEEE Trans. Geosci. Remote Sens.

Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes

IEEE Trans. Image Process.

Clustered object detection in aerial images

Adaptive anchor for fast object detection in aerial image

IEEE Geosci. Remote Sens. Lett.