Tactical UAV path optimization under radar threat using deep reinforcement learning

Alpdemir, M. Nedim

doi:10.1007/s00521-021-06702-3

Tactical UAV path optimization under radar threat using deep reinforcement learning

Original Article
Published: 08 January 2022

Volume 34, pages 5649–5664, (2022)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

M. Nedim Alpdemir ORCID: orcid.org/0000-0001-6411-1453¹

1152 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactical UAV (TUAV) path planning under hostile radar tracking threat has some peculiarities that distinguish it from other typical UAV path optimization problems. Particularly, 1–spatial regions delineated by threat probabilities may be legitimately penetrable under certain conditions that do not impair the survivability of the UAV and 2–A TUAV is detectable by a radar via its Radar Cross Section (RCS) which is a function of multiple parameters such as the radar operating frequency, the shape of the UAV and more importantly the engagement geometry between the radar and the UAV. The latter suggests that any maneuver performed by the UAV may change multiple angles that specify the engagement geometry. The work presented in this paper proposes a RL based solution to this complex problem in a novel way by 1–Implementing a Markov Decision Process (MDP) compliant RL environment with comprehensive probabilistic radar behavior models incorporated into it and 2–Integrating a core RL algorithm (namely DQN with Prioritized Experience Replay (DQN-PER) with a specific variant of transfer learning (namely learning from demonstrations (LfD)) in a single framework, demonstrating the utility of combining a core RL algorithm and a machine learning scheme toward boosting the performance of a learning agent, and more importantly to alleviate the sparse reward problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

UCAV Path Planning Algorithm Based on Deep Reinforcement Learning

Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments

Article 07 September 2019

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Notes

These operational parameters and performance characteristics are normally made accessible to tactical mission planners via well-maintained data repositories called Electronic Warfare Databases (EWDBs), containing a wide range of technical parameters. So it is fairly common to assume the availability of such information for a range of mainstream threats.

References

Abell DC, III, WDC (1998) A method for the determination of target aspect angle with respect to a radar
Andrews L (1998) of Photo-optical Instrumentation Engineers, S.: Special Functions of Mathematics for Engineers. Online access with subscription: SPIE Digital Library. SPIE Optical Engineering Press. https://books.google.com.tr/books?id=2CAqsF-RebgC
Bertsekas D.P. Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts
Bouhamed O, Ghazzai H, Besbes H, Massoud Y (2020) Autonomous uav navigation: A ddpg-based deep reinforcement learning approach
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Challita U, Saad W, Bettstetter C (2018) Deep reinforcement learning for interference-aware path planning of cellular-connected uavs. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–7. https://doi.org/10.1109/ICC.2018.8422706
Fujita Y, Nagarajan P, Kataoka T, Ishikawa T (2021) Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research 22(77), 1–14. http://jmlr.org/papers/v22/20-376.html
Garcia F, Rachelson E (2013) Markov Decision Processes, chap. 1, pp. 1–38. John Wiley and Sons, Ltd
Gosavi A (2015) Control Optimization with Reinforcement Learning, chap. 7, pp. 197–268. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4899-7491-4_7
Hare J (2019) Dealing with sparse rewards in reinforcement learning. CoRR abs/1910.09281. http://arxiv.org/abs/1910.09281
Hester T, Vecerík M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J.P, Leibo J.Z, Gruslys A (2017) Learning from demonstrations for real world reinforcement learning. CoRR abs/1704.03732. http://arxiv.org/abs/1704.03732
Inanc T, Muezzinoglu MK, Misovec K, Murray RM (2008) Framework for low-observable trajectory generation in presence of multiple radars. J Guid Control Dyn 31(6):1740–1749. https://doi.org/10.2514/1.35287
Article Google Scholar
Kabamba PT, Meerkov SM, Zeitz FH (2006) Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking. J Guid Control Dyn 29(2):279–288. https://doi.org/10.2514/1.14303
Article Google Scholar
Kang EW (2008) Radar system analysis, design, and simulation. ARTECH HOUSE, INC
Kingma D.P, Ba J (2017) Adam: a method for stochastic optimization
Lazaridis A, Fachantidis A, Vlahavas I (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Art Intell Res 69:1421–1471. https://doi.org/10.1613/jair.1.12412
Article MathSciNet MATH Google Scholar
Le TP, Vien NA, Chung T (2018) A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 6:49089–49102. https://doi.org/10.1109/ACCESS.2018.2854283
Article Google Scholar
Lee JW, Walker B, Cohen K (2021) Path planning of unmanned aerial vehicles in a dynamic environment. https://doi.org/10.2514/6.2011-1654
Mahafza B.R (2013) Radar systems analysis and design using matlab, third. CRC Press, London
MATH Google Scholar
Mes MRK, Rivera AP (2021) Approximate dynamic programming by practical examples, chap. 3, pp. 63–101. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-47766-4_3
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Article Google Scholar
Pelosi M, Kopp C, Brown M (2012) Range-limited uav trajectory using terrain masking under radar detection risk. Appl Artif Intell 26(8):743–759. https://doi.org/10.1080/08839514.2012.713308
Article Google Scholar
Pham HX, La HM, Feil-Seifer D, Nguyen LV (2018) Autonomous UAV navigation using reinforcement learning. CoRR abs/1801.05086. http://arxiv.org/abs/1801.05086
Qu C, Gai W, Zhong M, Zhang J (2020) A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (uavs) path planning. Applied Soft Computing 89:106099. https://doi.org/10.1016/j.asoc.2020.106099
Recht B (2019) A tour of reinforcement learning: the view from continuous control. Ann Rev Contr Robot Auton Syst 2(1):253–279. https://doi.org/10.1146/annurev-control-053018-023825
Article Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. http://arxiv.org/abs/1511.05952. Cite arxiv:1511.05952Comment: Published at ICLR 2016
Skolink M.I (1990) Radar handbook, second. McGraw-Hill, Oxford
Google Scholar
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge
MATH Google Scholar
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22. https://doi.org/10.1109/37.126844
Article Google Scholar
Swerling P (1954) Probability of detection for fluctuating targets. RAND Corporation, Santa Monica
Google Scholar
Theile M, Bayerlein H, Nai R, Gesbert D, Caccamo M (2020) Uav path planning using global and local map information with deep reinforcement learning
Yan C, Xiang X, Wang C (2020) Towards real-time path planning through deep reinforcement learning for a uav in dynamic environments. J Intell Robot Syst 98(2):297–309. https://doi.org/10.1007/s10846-019-01073-3
Article Google Scholar
Zabarankin M, Uryasev S, Murphey R (2006) Aircraft routing under the risk of detection. Naval Res Logist 53(8):728–747. https://doi.org/10.1002/nav.20165
Article MathSciNet MATH Google Scholar
Zeitz FH. Ucav path planning in the pesence of radar-guided surface-to-air missile threats, phd thesis
Zhan W, Wang W, Chen N, Wang C (2014) Efficient uav path planning with multiconstraints in a 3d large battlefield environment. Math Probl Eng 2014:597092. https://doi.org/10.1155/2014/597092
Article Google Scholar
Zhang Z, Wu J, Dai J, He C (2020) Rapid penetration path planning method for stealth uav in complex environment with bb threats. Int J Aerospace Engi 2020:8896357. https://doi.org/10.1155/2020/8896357
Article Google Scholar
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. CoRR abs/2009.07888. https://arxiv.org/abs/2009.07888

Download references

Acknowledgements

The work reported in this paper has partially been supported by TÜBİTAK BİLGEM. I am grateful for that support. I also thank to Mr. İlhan D. Tüfekçi now working for ASELSAN A.Ş., Ankara, Turkey, for his valuable input regarding radar modeling.

Author information

Authors and Affiliations

TÜBİTAK, Informatics and Information Security Research Center (BİLGEM), Gebze/Kocaeli, Turkey
M. Nedim Alpdemir

Authors

M. Nedim Alpdemir
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Nedim Alpdemir.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Alpdemir, M.N. Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput & Applic 34, 5649–5664 (2022). https://doi.org/10.1007/s00521-021-06702-3

Download citation

Received: 24 June 2021
Accepted: 27 October 2021
Published: 08 January 2022
Issue Date: April 2022
DOI: https://doi.org/10.1007/s00521-021-06702-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Tactical UAV path optimization under radar threat using deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

UCAV Path Planning Algorithm Based on Deep Reinforcement Learning

Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Tactical UAV path optimization under radar threat using deep reinforcement learning

Abstract

Access this article

Similar content being viewed by others

UCAV Path Planning Algorithm Based on Deep Reinforcement Learning

Towards Real-Time Path Planning through Deep Reinforcement Learning for a UAV in Dynamic Environments

Autonomous Maneuver Decision of UAV in Air Combat Based on Scenario-Transfer Deep Reinforcement Learning

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation