Abstract
The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactical UAV (TUAV) path planning under hostile radar tracking threat has some peculiarities that distinguish it from other typical UAV path optimization problems. Particularly, 1–spatial regions delineated by threat probabilities may be legitimately penetrable under certain conditions that do not impair the survivability of the UAV and 2–A TUAV is detectable by a radar via its Radar Cross Section (RCS) which is a function of multiple parameters such as the radar operating frequency, the shape of the UAV and more importantly the engagement geometry between the radar and the UAV. The latter suggests that any maneuver performed by the UAV may change multiple angles that specify the engagement geometry. The work presented in this paper proposes a RL based solution to this complex problem in a novel way by 1–Implementing a Markov Decision Process (MDP) compliant RL environment with comprehensive probabilistic radar behavior models incorporated into it and 2–Integrating a core RL algorithm (namely DQN with Prioritized Experience Replay (DQN-PER) with a specific variant of transfer learning (namely learning from demonstrations (LfD)) in a single framework, demonstrating the utility of combining a core RL algorithm and a machine learning scheme toward boosting the performance of a learning agent, and more importantly to alleviate the sparse reward problem.
Similar content being viewed by others
Notes
These operational parameters and performance characteristics are normally made accessible to tactical mission planners via well-maintained data repositories called Electronic Warfare Databases (EWDBs), containing a wide range of technical parameters. So it is fairly common to assume the availability of such information for a range of mainstream threats.
References
Abell DC, III, WDC (1998) A method for the determination of target aspect angle with respect to a radar
Andrews L (1998) of Photo-optical Instrumentation Engineers, S.: Special Functions of Mathematics for Engineers. Online access with subscription: SPIE Digital Library. SPIE Optical Engineering Press. https://books.google.com.tr/books?id=2CAqsF-RebgC
Bertsekas D.P. Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts
Bouhamed O, Ghazzai H, Besbes H, Massoud Y (2020) Autonomous uav navigation: A ddpg-based deep reinforcement learning approach
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Challita U, Saad W, Bettstetter C (2018) Deep reinforcement learning for interference-aware path planning of cellular-connected uavs. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–7. https://doi.org/10.1109/ICC.2018.8422706
Fujita Y, Nagarajan P, Kataoka T, Ishikawa T (2021) Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research 22(77), 1–14. http://jmlr.org/papers/v22/20-376.html
Garcia F, Rachelson E (2013) Markov Decision Processes, chap. 1, pp. 1–38. John Wiley and Sons, Ltd
Gosavi A (2015) Control Optimization with Reinforcement Learning, chap. 7, pp. 197–268. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4899-7491-4_7
Hare J (2019) Dealing with sparse rewards in reinforcement learning. CoRR abs/1910.09281. http://arxiv.org/abs/1910.09281
Hester T, Vecerík M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J.P, Leibo J.Z, Gruslys A (2017) Learning from demonstrations for real world reinforcement learning. CoRR abs/1704.03732. http://arxiv.org/abs/1704.03732
Inanc T, Muezzinoglu MK, Misovec K, Murray RM (2008) Framework for low-observable trajectory generation in presence of multiple radars. J Guid Control Dyn 31(6):1740–1749. https://doi.org/10.2514/1.35287
Kabamba PT, Meerkov SM, Zeitz FH (2006) Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking. J Guid Control Dyn 29(2):279–288. https://doi.org/10.2514/1.14303
Kang EW (2008) Radar system analysis, design, and simulation. ARTECH HOUSE, INC
Kingma D.P, Ba J (2017) Adam: a method for stochastic optimization
Lazaridis A, Fachantidis A, Vlahavas I (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Art Intell Res 69:1421–1471. https://doi.org/10.1613/jair.1.12412
Le TP, Vien NA, Chung T (2018) A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 6:49089–49102. https://doi.org/10.1109/ACCESS.2018.2854283
Lee JW, Walker B, Cohen K (2021) Path planning of unmanned aerial vehicles in a dynamic environment. https://doi.org/10.2514/6.2011-1654
Mahafza B.R (2013) Radar systems analysis and design using matlab, third. CRC Press, London
Mes MRK, Rivera AP (2021) Approximate dynamic programming by practical examples, chap. 3, pp. 63–101. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-47766-4_3
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Pelosi M, Kopp C, Brown M (2012) Range-limited uav trajectory using terrain masking under radar detection risk. Appl Artif Intell 26(8):743–759. https://doi.org/10.1080/08839514.2012.713308
Pham HX, La HM, Feil-Seifer D, Nguyen LV (2018) Autonomous UAV navigation using reinforcement learning. CoRR abs/1801.05086. http://arxiv.org/abs/1801.05086
Qu C, Gai W, Zhong M, Zhang J (2020) A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (uavs) path planning. Applied Soft Computing 89:106099. https://doi.org/10.1016/j.asoc.2020.106099
Recht B (2019) A tour of reinforcement learning: the view from continuous control. Ann Rev Contr Robot Auton Syst 2(1):253–279. https://doi.org/10.1146/annurev-control-053018-023825
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. http://arxiv.org/abs/1511.05952. Cite arxiv:1511.05952Comment: Published at ICLR 2016
Skolink M.I (1990) Radar handbook, second. McGraw-Hill, Oxford
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge
Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22. https://doi.org/10.1109/37.126844
Swerling P (1954) Probability of detection for fluctuating targets. RAND Corporation, Santa Monica
Theile M, Bayerlein H, Nai R, Gesbert D, Caccamo M (2020) Uav path planning using global and local map information with deep reinforcement learning
Yan C, Xiang X, Wang C (2020) Towards real-time path planning through deep reinforcement learning for a uav in dynamic environments. J Intell Robot Syst 98(2):297–309. https://doi.org/10.1007/s10846-019-01073-3
Zabarankin M, Uryasev S, Murphey R (2006) Aircraft routing under the risk of detection. Naval Res Logist 53(8):728–747. https://doi.org/10.1002/nav.20165
Zeitz FH. Ucav path planning in the pesence of radar-guided surface-to-air missile threats, phd thesis
Zhan W, Wang W, Chen N, Wang C (2014) Efficient uav path planning with multiconstraints in a 3d large battlefield environment. Math Probl Eng 2014:597092. https://doi.org/10.1155/2014/597092
Zhang Z, Wu J, Dai J, He C (2020) Rapid penetration path planning method for stealth uav in complex environment with bb threats. Int J Aerospace Engi 2020:8896357. https://doi.org/10.1155/2020/8896357
Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. CoRR abs/2009.07888. https://arxiv.org/abs/2009.07888
Acknowledgements
The work reported in this paper has partially been supported by TÜBİTAK BİLGEM. I am grateful for that support. I also thank to Mr. İlhan D. Tüfekçi now working for ASELSAN A.Ş., Ankara, Turkey, for his valuable input regarding radar modeling.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that he has no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Alpdemir, M.N. Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput & Applic 34, 5649–5664 (2022). https://doi.org/10.1007/s00521-021-06702-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06702-3