A lightweight multi-scale aggregated model for detecting aerial images captured by UAVs

https://doi.org/10.1016/j.jvcir.2021.103058Get rights and content

Highlights

  • The Multi-scale detection, detecting objects of various scales in aerial images.

  • The multi-scale fusion of the feature maps obtained by attention, enhance details.

  • Redistribute the weight of the loss to balance the accuracy between categories.

  • Use pruning technology to compress models to accelerate detection efficiency.

Abstract

Detecting the objects of interesting from aerial images captured by UAVs is one of the core modules in the UAV-based applications. However, it is very difficult to detection objects from aerial images. The reason is that the scale of objects in the aerial images captured by UAVs varies greatly and needs to meet certain real-time performance in detection. To deal with these challenges, we proposed a lightweight model named DSYolov3. We made the following improvements to the Yolov3 model: 1) multiple scale-aware decision discrimination network to detect objects in different scales, 2) a multi-scale fusion-based channel attention model to exploit the channel-wise information complementation, 3) a sparsity-based channel pruning to compress the model. Extensive experimental evaluation has demonstrated the effectiveness and efficiency of our approach. By the proposed approach, we could not only achieve better performance than most existing detectors but also ensure the models practicable on the UAVs.

Introduction

Unmanned Aerial Vehicle(UAV) is an aircraft that carries no human pilot or passengers. It is guided autonomously, or by remote control [1]. Equipped with high definition camera, UAVs can be remarkably efficient on surveillance tasks. Compared with surveillance cameras with fixed position and angle, the camera assembled in UAVs has its own unique advantages, such as varying shooting angles and positions. Due to the particularities of UAVs cameras, UAVs have been widely used in many social applications, such as object detection [2], target tracking [3], and path planning [4], etc.

Detecting the objects of interesting from images captured by UAVs is one of the core modules in the UAV-based applications, which has received increasing attention from multiple research communities in recent years. In research, visual object detection to determine the location of a target (e.g., pedestrians or vehicles) in a natural image, which has been extensively studied in computer vision in past years, and is also becoming increasing helpful in satellite and aviation monitoring [5], smart city construction [6], and construction of ecological civilization [7]. The UAV is an ideal platform for object detection since the aerial imagery system in UAVs can capture the scene in the view from top. However, compared with object detection in ordinary images, there is a significant difference to detect objects from the aerial image taken in UAVs: firstly, the angles of objects in aerial images are mostly viewed from the top and oblique, and the objects in aerial image are more difficult to recognize; secondly, though the size of the object changes with the shooting angle, the researchers are gratified that due to the shooting angle of view, there are fewer objects that overlap each other, so there is no need to pay too much attention to the overlap objects. Furthermore, traditional machine learning detection algorithms for UAVs [8] can not obtain good detection performance because the hand-crafted features are not very robust to changes in diversity. With the rapid development of deep learning frameworks, visual object detection has achieved great success for general object detection [9], [10]. Despite the great progress of generic object detection, they usually are just suitable for images captured from ground-based cameras and could not be directly transferred to UAV images. It is mainly due to two facts:

  • To fulfill the detection task, UAVs are required to process the captured data efficiently. However, the traditional deep object detectors, such as Faster-RCNN [11], usually incur high computational cost since large numbers of convolutional operations are involved during the detection. And the lightweight and flexibility of UAVs limits its ability to carry sufficient resource to perform computing-intensive tasks online. Although some lightweight object detectors, such as SSD [9] and Yolov3 [10] could run fast and achieve promising performance, they are mainly designed for generic objects, not ideal for UAV images.

  • Traditional deep object detectors usually require high-resolution images as input for high detection accuracy. However, the images captured by UAVs have varying object scales and usually contain many very small objects with low resolution, which are very difficult to detect. In order to cope with this problem, some improved algorithms are proposed for small object detection, such as SNIP [12] and Cascade-RCNN [13]. These algorithms have greatly improved the detection accuracy of small targets, but most of them rely on RPN [11] network and pyramid structure, which results in slow inference speed and the weak versatility of models. They are more suitable for offline detection tasks with low real-time requirement. Recently, Zhang et al. proposed a SlimYolov3 [14] detector which can perform the detection in real-time and is suitable for deploying on UAVs but the accuracy needs to be improved.

Despite some pioneering efforts for the object detection in UAV images, it still remains a under-explored task due to the above-mentioned challenges. To fill the research gap, this paper contributes a deep convolutional network-based fast object detector, named Deep Slight Yolov3 (DSYolov3) based on the classical Yolov3 [10] framework, to make the widely used Yolo series models be better applied to the task of detecting objects of different proportions in the images captured by the UAVs. Compared with the original Yolov3 framework, we propose a network structure with two additional components to make the model work better for scale-varying object detection. First, we propose to utilize multiple detection headers connected to different layers in the backbone network to detect objects at different scale. Then, we design a Multi-scale Fusion of Channel Attention Model (MFCAM) to exploit the channel-wise information complementation. Besides, a practical object detector for the UAVs has strict limits on the model parameters for high efficiency. To reduce the model redundancy in the manually-designed object detectors [15], [16], we design a simple and effective model pruning strategy to compress the proposed DSYolov3 framework by discarding the unimportant components without significantly affecting the performance for running efficiently in UAVs applications.

Compared to most existing small object detectors, our method is a one-stage detector and could obtain real-time performance while achieving better accuracy. After model pruning, the models are suitable for deploying on the UAVs. Our contributions are summarized as follows.

  • To fully combine the feature from different layers of a CNN backbone and detect object with different sizes, we optimize the Yolov3 model by modifying original three-level decision discrimination network to five-level. Detecting objects at different scales can locate small objects and classify them more accurately.

  • We propose a module named MFCAM (Multi-scale Fusion of Channel Attention Model) which combine the channel-wise attention mechanism and spatial pyramid pooling, to fuse the channel-wise feature at different scales for the scale-varying object detection tasks.

  • We adopt a simple model pruning strategy [15], [16] to balance the accuracy and the computational complexity by discarding the unimportant channels in the model, so that the model could be able to run efficiently in UAVs applications.

Extensive experimental evaluation on VisDrone2018-Det benchmark dataset and UAVDT dataset shows that it could achieve promising performance in small object detection task and the pruned model is applicable in UAVs applications. The rest of this paper is organized as follows: Section 2 briefly reviews the related works on object detection algorithms in deep learning, attention mechanisms and model compression algorithms. In Section 3, we introduce our proposed framework and adopted approaches, followed by the extensive experimental details in Section 4. Finally we analyze our approach in Section 5 and conclude the work in Section 6.

Section snippets

Object detection in deep learning

Deep learning solve a problem from raw data in an end-to-end manner. Since AlexNet [17] was proposed and achieved a remarkable classification accuracy on the Large Scale Visual Identity Challenge (ILSRVC) [18], researchers in computer vision are gradually focusing their research on deep learning. In 2014, R-CNN introduced deep learning methods into object detection, which improved the detection solution substantially and began to reshape the object detection approaches. R-CNN adopts selective

The proposed approach

In this paper, we aim to deploy a deep learning model to detect small objects in the UAVs. As shown in Fig. 4, we develop a deep Yolov3 framework for addressing the challenge of the original Yolov3 obtains unsatisfactory accuracy when detecting pictures captured by UAVs.

The overall architecture of the proposed DSYolov3 is shown in Fig. 4. In the method, we use Darknet [10] as the backbone network and design a Multi-scale Fusion of Channel Attention Model (MFCAM) to extract low-level location

Datasets

In this section, we evaluate the proposed DSYolov3 model on the VisDrone2018-Det [28] dataset and UAVDT [57] dataset in extensive experiments.

Discussion

The goal of our method is to upgrade the Yolo model, compress the model volume so that it can be deployed on the UAVs and complete the target detection task quickly and accurately.

Although there are a certain amount of normal size objects, most objects in the image captured by UAVs are small, and difficult to identify due to lack of sufficient features. We adopt a five-level decision discrimination network instead of three-level in original Yolov3, this network is designed to detect objects of

Conclusion

In this paper, we address the problem of object detection in aerial images captured by UAVs and propose a model to detect object with small size more accurately and efficiently. The proposed model which is called DSYolov3 is derived from the original Yolov3. To make it work well on small object detection, we introduce two components to fully exploit the multi-scale global information in the feature channel dimension and the fused features thoroughly. First, we propose a network structure with a

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement

This work was supported in part by the National Key R&D Program of China under grant 2018AAA0102002, and in part by the National Natural Science Foundation of China (NSFC) under grants 61976076, 61632007, 61932009 and 61806066.

References (61)

  • Arthur P. Cracknell

    Uavs: regulations and law enforcement

    Int. J. Remote Sens.

    (2017)
  • Xindi Zhang et al.

    Dense and small object detection in uav vision based on cascade network

  • Chunhui Zhang et al.

    Accurate uav tracking with distance-injected overlap maximization

  • Vincent Roberge et al.

    Comparison of parallel genetic algorithm and particle swarm optimization for real-time uav path planning

    IEEE Trans. Ind. Informat.

    (2012)
  • Qingpeng Li, Lichao Mou, Qizhi Xu, Yun Zhang, Xiao Xiang Zhu, R3-net: A deep network for multi-oriented vehicle...
  • Xiaofei Yang et al.

    Road detection and centerline extraction via deep recurrent convolutional neural network u-net

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • Benjamin Kellenberger et al.

    Half a percent of labels is enough: Efficient animal detection in uav imagery using deep cnns and active learning

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • Yakoub Bazi et al.

    Convolutional svm networks for object detection in uav imagery

    Ieee Trans. Geosci. Remote Sens.

    (2018)
  • Wei Liu et al.

    Ssd: Single shot multibox detector

  • Joseph Redmon, Ali Farhadi, Yolov3: An incremental improvement, arXiv preprint arXiv:1804.02767,...
  • Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster r-cnn: Towards real-time object detection with region...
  • Bharat Singh et al.

    An analysis of scale invariance in object detection snip

  • Zhaowei Cai et al.

    Cascade r-cnn: Delving into high quality object detection

  • Pengyi Zhang, Yunxin Zhong, Xiaoqiong Li, Slimyolov3: Narrower, faster and better for real-time uav applications, arXiv...
  • Jianbo Ye, Xin Lu, Zhe Lin, James Z. Wang, Rethinking the smaller-norm-less-informative assumption in channel pruning...
  • Zhuang Liu et al.

    Learning efficient convolutional networks through network slimming

  • Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks,...
  • Olga Russakovsky et al.

    Imagenet large scale visual recognition challenge

    Int. J. Comput. Vision

    (2015)
  • Ross Girshick et al.

    Rich feature hierarchies for accurate object detection and semantic segmentation

  • Kaiming He et al.

    Mask r-cnn

  • Chao Peng, Tete Xiao, Zeming Li, Yuning Jiang, Xiangyu Zhang, Kai Jia, Gang Yu, Jian Sun, Megdet: A large mini-batch...
  • Tsung-Yi Lin et al.

    Focal loss for dense object detection

  • Shifeng Zhang et al.

    Single-shot refinement neural network for object detection

  • Qijie Zhao et al.

    M2det: A single-shot object detector based on multi-level feature pyramid network

    Proceedings of the AAAI Conference on Artificial Intelligence

    (2019)
  • Ruchan Dong et al.

    Sig-nms-based faster r-cnn combining transfer learning for small target detection in vhr optical remote sensing imagery

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • Zheng Yang et al.

    Detecting small objects in urban settings using slimnet model

    IEEE Trans. Geosci. Remote Sens.

    (2019)
  • Qi Wang et al.

    Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes

    IEEE Trans. Image Process.

    (2019)
  • Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Qinqin Nie, Hao Cheng, Chenfeng Liu, Xiaoyu...
  • Fan Yang et al.

    Clustered object detection in aerial images

  • Ren Jin et al.

    Adaptive anchor for fast object detection in aerial image

    IEEE Geosci. Remote Sens. Lett.

    (2019)
  • Cited by (25)

    • Automated object detection on aerial images for limited capacity embedded device using a lightweight CNN model

      2022, Alexandria Engineering Journal
      Citation Excerpt :

      At an input size of 832x832 and pruning ratio of 95%, the SlimYOLOv3-SPP achieved mAP of 21.2%. The pruning strategy was also used to compress the YOLOv3 model with additional improvements, including five-level detection layers and a multi-scale fusion-based channel attention model that combines the channel-wise attention mechanism and spatial pyramid pooling [45]. The DSYOLOv3 achieved mAP of 11.6% at a pruning ratio of 95% and an input size of 416x416.

    View all citing articles on Scopus

    This paper has been recommended for acceptance by Zicheng Liu.

    View full text