skip to main content
research-article

ISDNet: AI-enabled Instance Segmentation of Aerial Scenes for Smart Cities

Published: 10 August 2021 Publication History

Abstract

Aerial scenes captured by UAVs have immense potential in IoT applications related to urban surveillance, road and building segmentation, land cover classification, and so on, which are necessary for the evolution of smart cities. The advancements in deep learning have greatly enhanced visual understanding, but the domain of aerial vision remains largely unexplored. Aerial images pose many unique challenges for performing proper scene parsing such as high-resolution data, small-scaled objects, a large number of objects in the camera view, dense clustering of objects, background clutter, and so on, which greatly hinder the performance of the existing deep learning methods. In this work, we propose ISDNet (Instance Segmentation and Detection Network), a novel network to perform instance segmentation and object detection on visual data captured by UAVs. This work enables aerial image analytics for various needs in a smart city. In particular, we use dilated convolutions to generate improved spatial context, leading to better discrimination between foreground and background features. The proposed network efficiently reuses the segment-mask features by propagating them from early stages using residual connections. Furthermore, ISDNet makes use of effective anchors to accommodate varying object scales and sizes. The proposed method obtains state-of-the-art results in the aerial context.

References

[1]
Mobeen Ahmad, Muhammad Abdullah, and Dongil Han. 2020. Small object detection in aerial imagery using RetinaNet with anchor optimization. In Proceedings of the International Conference on Electronics, Information, and Communication (ICEIC’20). IEEE, 1–3.
[2]
Tejasvi Alladi, Vinay Chamola, Neeraj Kumar, et al. 2020. PARTH: A two-stage lightweight mutual authentication protocol for UAV surveillance networks. Comput. Commun. 160, 1 (2020), 81--90.
[3]
Tejasvi Alladi, Vinay Chamola, Nishad Sahu, and Mohsen Guizani. 2020. Applications of blockchain in unmanned aerial vehicles: A review. Vehic. Commun. 23, 1 (2020), 100249. https://www.sciencedirect.com/science/article/abs/pii/S2214209620300206.
[4]
Holger Caesar, Jasper Uijlings, and Vittorio Ferrari. 2018. COCO-stuff: Thing and stuff classes in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1209–1218.
[5]
Zhaowei Cai and Nuno Vasconcelos. 2018. Cascade R-CNN: Delving into high quality object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6154–6162.
[6]
Jingwei Cao, Chuanxue Song, Silun Peng, Feng Xiao, and Shixin Song. 2019. Improved traffic sign detection and recognition algorithm for intelligent vehicles. Sensors 19, 18 (2019), 4021.
[7]
Vinay Chamola, Vikas Hassija, Vatsal Gupta, and Mohsen Guizani. 2020. A comprehensive review of the COVID-19 pandemic and the role of IoT, Drones, AI, Blockchain, and 5G in managing its impact. IEEE Access 8 (2020), 90225–90265.
[8]
Kai Chen, Jiangmiao Pang, Jiaqi Wang, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jianping Shi, Wanli Ouyang, et al. 2019. Hybrid task cascade for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4974–4983.
[9]
Kai Chen, Jiaqi Wang, Jiangmiao Pang, Yuhang Cao, Yu Xiong, Xiaoxiao Li, Shuyang Sun, Wansen Feng, Ziwei Liu, Jiarui Xu, Zheng Zhang, Dazhi Cheng, Chenchen Zhu, Tianheng Cheng, Qijie Zhao, Buyu Li, Xin Lu, Rui Zhu, Yue Wu, Jifeng Dai, Jingdong Wang, Jianping Shi, Wanli Ouyang, Chen Change Loy, and Dahua Lin. 2019. MMDetection: Open MMLab detection toolbox and benchmark. arXiv preprint arXiv:1906.07155 (2019).
[10]
Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille. 2017. DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 4 (2017), 834–848.
[11]
J. Q. Cui, S. Lai, X. Dong, P. Liu, B. M. Chen, and T. H. Lee. 2014. Autonomous navigation of UAV in forest. In Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS’14). 726–733.
[12]
S. K. Datta, J. Dugelay, and C. Bonnet. 2018. IoT based UAV platform for emergency services. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC’18). 144–147.
[13]
Jian Ding, Nan Xue, Yang Long, Gui-Song Xia, and Qikai Lu. 2018. Learning ROI transformer for detecting oriented objects in aerial images. arXiv preprint arXiv:1812.00155 (2018).
[14]
Shivangi Dwivedi, Murari Mandal, Shekhar Yadav, and Santosh Kumar Vipparthi. 2020. 3D CNN with localized residual connections for hyperspectral image classification. In Proceedings of the Computer Vision and Image Processing: 4th International Conference (CVIP’19). Springer, 354–363.
[15]
Spyros Gidaris and Nikos Komodakis. 2016. Attend refine repeat: Active box proposal generation via in-out localization. arXiv preprint arXiv:1606.04446 (2016).
[16]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 1440–1448.
[17]
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 580–587.
[18]
Ross Girshick, Forrest Iandola, Trevor Darrell, and Jitendra Malik. 2015. Deformable part models are convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 437–446.
[19]
Ryuhei Hamaguchi, Aito Fujita, Keisuke Nemoto, Tomoyuki Imaizumi, and Shuhei Hikosaka. 2018. Effective use of dilated convolutions for segmenting small object instances in remote sensing imagery. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’18). IEEE, 1442–1450.
[20]
Vikas Hassija, Vinay Chamola, Dara Nanda Gopala Krishna, and Mohsen Guizani. 2020. A distributed framework for energy trading between UAVs and charging stations for critical applications. IEEE Trans. Vehic. Technol. 69, 5 (2020), 5391–5402.
[21]
Vikas Hassija, Vikas Saxena, and Vinay Chamola. 2020. Scheduling drone charging for multi-drone network based on consensus time-stamp and game theory. Comput. Commun. 149 (2020), 51–61.
[22]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. 2961–2969.
[23]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’16). 770–778.
[24]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[25]
Hexuan Hu, Bo Tang, Xuejiao Gong, Wei Wei, and Huihui Wang. 2017. Intelligent fault diagnosis of the high-speed train with big data based on deep neural networks. IEEE Trans. Industr. Inform. 13, 4 (2017), 2106–2116.
[26]
Mohamed Hussein, Tarek Sayed, Passant Reyad, and Lee Kim. 2015. Automated pedestrian safety analysis at a signalized intersection in New York City: Automated data extraction for safety diagnosis and behavioral study. Transport. Res. Rec. 2519, 1 (2015), 17–27.
[27]
Michael Kampffmeyer, Arnt-Borre Salberg, and Robert Jenssen. 2016. Semantic segmentation of small objects and modeling of uncertainty in urban remote sensing images using deep convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 1–9.
[28]
Hei Law and Jia Deng. 2018. Cornernet: Detecting objects as paired keypoints. In Proceedings of the European Conference on Computer Vision (ECCV’18). 734–750.
[29]
Xiaohua Li, Chenxu Zhao, Xiaofeng Lu, and Wei Wei. 2019. DA-PMHT for multistatic passive radar multitarget tracking in dense clutter environment. IEEE Access 7 (2019), 49316–49326.
[30]
T. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár. 2020. Focal loss for dense object detection. IEEE Trans. Pattern Anal. Mach. Intell. 42, 2 (2020), 318–327.
[31]
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2117–2125.
[32]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.
[33]
Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8759–8768.
[34]
Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C. Berg. 2016. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision. Springer, 21–37.
[35]
Yuncheng Lu, Zhucun Xue, Gui-Song Xia, and Liangpei Zhang. 2018. A survey on vision-based UAV navigation. Geo-spat. Inf. Sci. 21, 1 (2018), 21–32.
[36]
Murari Mandal, Mallika Chaudhary, Santosh Kumar Vipparthi, Subrahmanyam Murala, Anil Balaji Gonde, and Shyam Krishna Nagar. 2018. ANTIC: ANTithetic isomeric cluster patterns for medical image retrieval and change detection. IET Comput. Vis. 13, 1 (2018), 31–43.
[37]
Murari Mandal, Vansh Dhar, Abhishek Mishra, and Santosh Kumar Vipparthi. 2019. 3DFR: A swift 3D feature reductionist framework for scene independent change detection. IEEE Sig. Proc. Lett. 26, 12 (2019), 1882–1886.
[38]
Murari Mandal, Lav Kush Kumar, Mahipal Singh Saran, and Santosh Kumar vipparthi. 2020. MotionRec: A unified deep framework for moving object recognition. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV’20).
[39]
Murari Mandal, Prafulla Saxena, Santosh Kumar Vipparthi, and Subrahmanyam Murala. 2018. CANDID: Robust change dynamics and deterministic update policy for dynamic background subtraction. In Proceedings of the 24th International Conference on Pattern Recognition (ICPR’18). IEEE, 2468–2473.
[40]
Murari Mandal, Manal Shah, Prashant Meena, Sanhita Devi, and Santosh Kumar Vipparthi. 2019. AVDNet: A small-sized vehicle detection network for aerial visual data. IEEE Geosci. Rem. Sens. Lett. 17, 3 (2019), 494--498.
[41]
Murari Mandal, Manal Shah, Prashant Meena, and Santosh Kumar Vipparthi. 2019. SSSDET: Simple short and shallow network for resource efficient vehicle detection in aerial scenes. In Proceedings of the IEEE International Conference on Image Processing (ICIP’19). IEEE, 3098–3102.
[42]
Lichao Mou and Xiao Xiang Zhu. 2018. Vehicle instance segmentation from aerial image and video using a multitask learning residual fully convolutional network. IEEE Trans. Geosci. Rem. Sens. 56, 11 (2018), 6699–6711.
[43]
Byeongjoon Noh, Wonjun No, Jaehong Lee, and David Lee. 2020. Vision-based potential pedestrian risk analysis on unsignalized crosswalk using data mining techniques. Appl. Sci. 10, 3 (2020), 1057.
[44]
Hyeonwoo Noh, Seunghoon Hong, and Bohyung Han. 2015. Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision. 1520–1528.
[45]
Pedro O. Pinheiro, Ronan Collobert, and Piotr Dollár. 2015. Learning to segment object candidates. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 1990–1998.
[46]
Pedro O. Pinheiro, Tsung-Yi Lin, Ronan Collobert, and Piotr Dollár. 2016. Learning to refine object segments. In Proceedings of the European Conference on Computer Vision. Springer, 75–91.
[47]
R. Polishetty, M. Roopaei, and P. Rad. 2016. A next-generation secure cloud-based deep learning license plate recognition for smart cities. In Proceedings of the 15th IEEE International Conference on Machine Learning and Applications (ICMLA’16). 286–293.
[48]
Jose-Luis Poza-Lujan, Juan-Luis Posadas-Yagüe, José-Enrique Simó-Ten, and Francisco Blanes. 2020. Distributed architecture to integrate sensor information: Object recognition for smart cities. Sensors 20, 1 (2020), 112.
[49]
Joseph Redmon and Ali Farhadi. 2017. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7263–7271.
[50]
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the International Conference on Advances in Neural Information Processing Systems. 91–99.
[51]
Nader Samir Labib, Grégoire Danoy, Jedrzej Musial, Matthias R. Brust, and Pascal Bouvry. 2019. Internet of unmanned aerial vehicles—A multilayer low-altitude airspace model for distributed UAV traffic management. Sensors 19, 21 (2019), 4779.
[52]
E. Semsch, M. Jakob, D. Pavlicek, and M. Pechoucek. 2009. Autonomous UAV surveillance in complex urban environments. In Proceedings of the IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, Vol. 2. 82–85.
[53]
Zengguo Sun, Dedao Lin, Wei Wei, Marcin Woźniak, and Robertas Damaševičius. 2020. Road detection based on shearlet for GF-3 synthetic aperture radar images. IEEE Access 8 (2020), 28133–28141.
[54]
F. Vanegas, K. J. Gaston, J. Roberts, and F. Gonzalez. 2019. A framework for UAV navigation and exploration in GPS-Denied environments. In Proceedings of the IEEE Aerospace Conference. 1–6.
[55]
Syed Waqas Zamir, Aditya Arora, Akshita Gupta, Salman Khan, Guolei Sun, Fahad Shahbaz Khan, Fan Zhu, Ling Shao, Gui-Song Xia, and Xiang Bai. 2019. iSAID: A large-scale dataset for instance segmentation in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 28–37.
[56]
Gui-Song Xia, Xiang Bai, Jian Ding, Zhen Zhu, Serge Belongie, Jiebo Luo, Mihai Datcu, Marcello Pelillo, and Liangpei Zhang. 2018. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’18).
[57]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1492–1500.
[58]
Yongchao Xu, Mingtao Fu, Qimeng Wang, Yukang Wang, Kai Chen, Gui-Song Xia, and Xiang Bai. 2020. Gliding vertex on the horizontal bounding box for multi-oriented object detection. IEEE Trans. Pattern Anal. Mach. Intell. 43, 4 (2020), 1452--1459. https://doi.org/10.1109/TPAMI.2020.2974745
[59]
Bin Yang, Junjie Yan, Zhen Lei, and Stan Z Li. 2016. Craft objects from images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6043–6051.
[60]
Fan Yang, Heng Fan, Peng Chu, Erik Blasch, and Haibin Ling. 2019. Clustered object detection in aerial images. In Proceedings of the IEEE International Conference on Computer Vision. 8311–8320.
[61]
Xue Yang, Jirui Yang, Junchi Yan, Yue Zhang, Tengfei Zhang, Zhi Guo, Xian Sun, and Kun Fu. 2019. SCRDet: Towards more robust detection for small, cluttered and rotated objects. In Proceedings of the IEEE International Conference on Computer Vision. 8232–8241.
[62]
Fisher Yu and Vladlen Koltun. 2015. Multi-scale context aggregation by dilated convolutions. arXiv preprint arXiv:1511.07122 (2015).
[63]
Liang Zhang, Leqi Wei, Peiyi Shen, Wei Wei, Guangming Zhu, and Juan Song. 2018. Semantic SLAM based on object detection and improved Octomap. IEEE Access 6 (2018), 75545–75559.
[64]
Bin Zhou, Xuemei Duan, Dongjun Ye, Wei Wei, Marcin Woźniak, Dawid Połap, and Robertas Damaševičius. 2019. Multi-level features extraction for discontinuous target tracking in remote sensing image monitoring. Sensors 19, 22 (2019), 4855.
[65]
Sihang Zhou, Dong Nie, Ehsan Adeli, Jianping Yin, Jun Lian, and Dinggang Shen. 2019. High-resolution encoder–decoder networks for low-contrast medical image segmentation. IEEE Trans. Image Proc. 29 (2019), 461–475.
[66]
Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Haibin Ling, Qinghua Hu, Haotian Wu, Qinqin Nie, Hao Cheng, Chenfeng Liu, et al. 2018. VisDrone-VDT2018: The vision meets drone video detection and tracking challenge results. In Proceedings of the European Conference on Computer Vision (ECCV’18).

Cited By

View all
  • (2025)Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing ImagesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333656336:1(595-609)Online publication date: Jan-2025
  • (2025)Interpretability Analysis of Data Augmented Convolutional Neural Network in Mineral Prospectivity Mapping Using Black-Box Visualization ToolsNatural Resources Research10.1007/s11053-025-10462-5Online publication date: 31-Jan-2025
  • (2024)Impact of Artificial Intelligence in Urban TourismRecent trends in Management and Commerce10.46632/rmc/5/2/155:2(70-74)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Internet Technology
ACM Transactions on Internet Technology  Volume 21, Issue 3
August 2021
522 pages
ISSN:1533-5399
EISSN:1557-6051
DOI:10.1145/3468071
  • Editor:
  • Ling Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 August 2021
Accepted: 01 July 2020
Revised: 01 July 2020
Received: 01 April 2020
Published in TOIT Volume 21, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Smart cities
  2. UAVs
  3. deep learning
  4. instance segmentation
  5. object detection
  6. aerial scenes

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • BITS Additional Competitive Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)59
  • Downloads (Last 6 weeks)3
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Learning to Aggregate Multi-Scale Context for Instance Segmentation in Remote Sensing ImagesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2023.333656336:1(595-609)Online publication date: Jan-2025
  • (2025)Interpretability Analysis of Data Augmented Convolutional Neural Network in Mineral Prospectivity Mapping Using Black-Box Visualization ToolsNatural Resources Research10.1007/s11053-025-10462-5Online publication date: 31-Jan-2025
  • (2024)Impact of Artificial Intelligence in Urban TourismRecent trends in Management and Commerce10.46632/rmc/5/2/155:2(70-74)Online publication date: 1-Jul-2024
  • (2024)The Role of IoT in Shaping the Future of Geospatial AIRecent Trends in Geospatial AI10.4018/979-8-3693-8054-3.ch007(177-216)Online publication date: 13-Dec-2024
  • (2024)Modeling and Analysis of Kamikaze UAV Design with 3 Different Wing ConfigurationsJournal of Mathematical Sciences and Modelling10.33187/jmsm.1505481(90-103)Online publication date: 15-Aug-2024
  • (2024)Evaluating the Effectiveness of Panoptic Segmentation Through Comparative AnalysisBitlis Eren Üniversitesi Fen Bilimleri Dergisi10.17798/bitlisfen.147304113:3(681-691)Online publication date: 26-Sep-2024
  • (2024)Adaptive Pruning of Channel Spatial Dependability in Convolutional Neural NetworksProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681419(6073-6082)Online publication date: 28-Oct-2024
  • (2024)A Comprehensive Review on Limitations of Autonomous Driving and Its Impact on Accidents and CollisionsIEEE Open Journal of Vehicular Technology10.1109/OJVT.2023.33351805(142-161)Online publication date: 2024
  • (2024)A transformer-based UAV instance segmentation model TF-YOLOv7Signal, Image and Video Processing10.1007/s11760-023-02992-318:4(3299-3308)Online publication date: 9-Feb-2024
  • (2023)A survey on state-of-the-art computing for cyber-physical systems2ND INTERNATIONAL CONFERENCE ON RECENT ADVANCES IN COMPUTATIONAL TECHNIQUES10.1063/5.0150080(020001)Online publication date: 2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media