research-article

Monocular 3D Pose Estimation of Very Small Airplane in the Air

Authors:

Younggun LeeAuthors Info & Claims

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

Article No.: 82, Pages 1 - 7

https://doi.org/10.1145/3595916.3626456

Published: 01 January 2024 Publication History

Abstract

In this paper, a novel pose estimation algorithm is proposed specifically for maneuvering airplanes in the air. The algorithm consists of two main stages. The first stage involves semantic segmentation of a monocular input image of a flying airplane, where the entire captured area serves as feature points for the airplane, which are typically small in the image. The second stage focuses on the 3D pose estimation of the segmented image using projective registration. Since airplanes have unique characteristics and there is a scarcity of airplane-specific datasets, a custom dataset is generated for the experiments. Unreal Engine 4, a 3D computer graphics game engine renowned for its realistic simulations, is employed for this purpose. Experimental results demonstrate the suitability of the algorithm for 3D pose estimation of airplanes, providing valuable information for studying autonomous control of airplanes.

References

[1]

Daniel Bolya, Chong Zhou, Fanyi Xiao, and Yong Jae Lee. 2019. Yolact: Real-time instance segmentation. In Proceedings of the IEEE/CVF international conference on computer vision. 9157–9166.

[2]

Garrick Brazil and Xiaoming Liu. 2019. M3d-rpn: Monocular 3d region proposal network for object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 9287–9296.

[3]

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh Vora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Giancarlo Baldan, and Oscar Beijbom. 2020. nuscenes: A multimodal dataset for autonomous driving. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 11621–11631.

[4]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017).

[5]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European conference on computer vision (ECCV). 801–818.

Digital Library

[6]

Oscal Tzyh-Chiang Chen, Yu-Xuan Chang, Yu-Wei Jhao, Chih-Yu Chung, Yun-Ling Chang, and Wei-Hsiang Huang. 2022. 3D Object Detection of Cars and Pedestrians by Deep Neural Networks from Unit-Sharing One-Shot NAS. In 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–8. https://doi.org/10.1109/AVSS56176.2022.9959427

[7]

Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, and Raquel Urtasun. 2016. Monocular 3d object detection for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2147–2156.

[8]

Jin-Kyu Choi, Yong-Tae Lee, HeaSook Park, BongSoo Kim, and Byung-Woon Kim. 2022. Challenges to the Development of Manned and Unmanned Combat Systems. In 2022 13th International Conference on Information and Communication Technology Convergence (ICTC). 2362–2364. https://doi.org/10.1109/ICTC55196.2022.9952483

[9]

Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu, and Ping Luo. 2020. Learning depth-guided convolutions for monocular 3d object detection. In Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition workshops. 1000–1001.

[10]

Daoyong Fu, Songchen Han, Wei Li, and Hanren Lin. 2023. The Pose Estimation of the Aircraft on the Airport Surface Based on the Contour Features. IEEE Trans. Aerospace Electron. Systems 59, 2 (2023), 817–826. https://doi.org/10.1109/TAES.2022.3192220

[11]

Andreas Geiger, Philip Lenz, and Raquel Urtasun. 2012. Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite. In Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[12]

Tong He and Stefano Soatto. 2019. Mono3d++: Monocular 3d vehicle detection with two-scale 3d hypotheses and task priors. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 8409–8416.

Digital Library

[13]

Xinyu Huang, Xinjing Cheng, Qichuan Geng, Binbin Cao, Dingfu Zhou, Peng Wang, Yuanqing Lin, and Ruigang Yang. 2018. The apolloscape dataset for autonomous driving. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 954–960.

[14]

Peixuan Li, Huaici Zhao, Pengfei Liu, and Feidao Cao. 2020. Rtm3d: Real-time monocular 3d detection from object keypoints for autonomous driving. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16. Springer, 644–660.

[15]

Shichao Li, Zengqiang Yan, Hongyang Li, and Kwang-Ting Cheng. 2021. Exploring intermediate representation for monocular vehicle pose estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 1873–1883.

[16]

Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. 2017. Feature pyramid networks for object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition. 2117–2125.

[17]

Shu Liu, Lu Qi, Haifang Qin, Jianping Shi, and Jiaya Jia. 2018. Path aggregation network for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition. 8759–8768.

[18]

Arsalan Mousavian, Dragomir Anguelov, John Flynn, and Jana Kosecka. 2017. 3d bounding box estimation using deep learning and geometry. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 7074–7082.

[19]

Mrunalini Nalamati, Ankit Kapoor, Muhammed Saqib, Nabin Sharma, and Michael Blumenstein. 2019. Drone Detection in Long-Range Surveillance Videos. In 2019 16th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2019.8909830

[20]

Gerhard Neuhold, Tobias Ollmann, Samuel Rota Bulo, and Peter Kontschieder. 2017. The mapillary vistas dataset for semantic understanding of street scenes. In Proceedings of the IEEE international conference on computer vision. 4990–4999.

[21]

Felix Nobis, Ehsan Shafiei, Phillip Karle, Johannes Betz, and Markus Lienkamp. 2021. Radar voxel fusion for 3D object detection. Applied Sciences 11, 12 (2021), 5598.

[22]

Adrian P. Pope, Jaime S. Ide, Daria Micovic, Henry Díaz, David Rosenbluth, Lee Ritholtz, Jason C. Twedt, Thayne T. Walker, Kevin Alcedo, and Daniel Javorsek. 2021. Hierarchical Reinforcement Learning for Air-to-Air Combat. CoRR abs/2105.00990 (2021). arXiv:2105.00990https://arxiv.org/abs/2105.00990

[23]

Arne Schumann, Lars Sommer, Johannes Klatte, Tobias Schuchert, and Jürgen Beyerer. 2017. Deep cross-domain flying obrazilject classification for robrazilust UAV detection. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2017.8078558

[24]

Lars Sommer, Arne Schumann, Thomas Müller, Tobrazilias Schuchert, and Jürgen Beyerer. 2017. Flying object detection for automatic UAV recognition. In 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). 1–6. https://doi.org/10.1109/AVSS.2017.8078557

[25]

Nian Wang, Zhe Zhang, Jing Xiao, and Li Cui. 2019. DeepLap: A deep learning based non-specific low back pain symptomatic muscles recognition system. In 2019 16th Annual IEEE International Conference on Sensing, Communication, and Networking (SECON). IEEE, 1–9.

Digital Library

[26]

Di Wu, Zhaoyong Zhuang, Canqun Xiang, Wenbin Zou, and Xia Li. 2019. 6d-vnet: End-to-end 6-dof vehicle pose estimation from monocular rgb images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 0–0.

[27]

Jie Xu, Qing Guo, Lei Xiao, Zhaoyi Li, and Gaowei Zhang. 2019. Autonomous Decision-Making Method for Combat Mission of UAV based on Deep Reinforcement Learning. In 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Vol. 1. 538–544. https://doi.org/10.1109/IAEAC47372.2019.8998066

[28]

Jaewoong Yoo, Hyunki Seong, David Hyunchul Shim, Jung Ho Bae, and Yong-Duk Kim. 2022. Deep Reinforcement Learning-based Intelligent Agent for Autonomous Air Combat. In 2022 IEEE/AIAA 41st Digital Avionics Systems Conference (DASC). 1–9. https://doi.org/10.1109/DASC55683.2022.9925811

Index Terms

Monocular 3D Pose Estimation of Very Small Airplane in the Air
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Computer graphics
    1. Animation
    2. Shape modeling

Index terms have been assigned to the content through auto-classification.

Recommendations

Unsupervised universal hierarchical multi-person 3D pose estimation for natural scenes
Abstract
Multi-person 3D pose estimation using a monocular freely moving camera in real-world scenarios remains a challenge. There is a lack of data with 3D ground truth, and real-world scenes usually contain self-occlusions and inter-person occlusions. To ...
3D motion estimation of human body from video with dynamic camera work
MPRSS'12: Proceedings of the First international conference on Multimodal Pattern Recognition of Social Signals in Human-Computer-Interaction

Occlusion or camera setting produces a high degree of ambiguity when estimating human body motion from monocular video sequences. Good human motion models are an important means of addressing this problem. In this work, we propose a hierarchical motion ...
Terminal phase vision-based target recognition and 3d pose estimation for a tail-sitter, vertical takeoff and landing unmanned air vehicle
PSIVT'06: Proceedings of the First Pacific Rim conference on Advances in Image and Video Technology

This paper presents an approach to accurately identify landing targets and obtain 3D pose estimates for vertical takeoff and landing unmanned air vehicles via computer vision methods. The objective of this paper is to detect and recognize a pre-known ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MMAsia '23: Proceedings of the 5th ACM International Conference on Multimedia in Asia

December 2023

745 pages

ISBN:9798400702051

DOI:10.1145/3595916

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 January 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Agency for Defense Development

Conference

MMAsia '23

Sponsor:

SIGMM

MMAsia '23: ACM Multimedia Asia

December 6 - 8, 2023

Tainan, Taiwan

Acceptance Rates

Overall Acceptance Rate 59 of 204 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
83
Total Downloads

Downloads (Last 12 months)44
Downloads (Last 6 weeks)3

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Table of Conten