Skip to main content
Log in

Tactical UAV path optimization under radar threat using deep reinforcement learning

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

The majority of the research efforts that aim to solve UAV path optimization problems in a Reinforcement Learning (RL) setting focus on closed spaces or urban areas as the operating environment. The problem of Tactical UAV (TUAV) path planning under hostile radar tracking threat has some peculiarities that distinguish it from other typical UAV path optimization problems. Particularly, 1–spatial regions delineated by threat probabilities may be legitimately penetrable under certain conditions that do not impair the survivability of the UAV and 2–A TUAV is detectable by a radar via its Radar Cross Section (RCS) which is a function of multiple parameters such as the radar operating frequency, the shape of the UAV and more importantly the engagement geometry between the radar and the UAV. The latter suggests that any maneuver performed by the UAV may change multiple angles that specify the engagement geometry. The work presented in this paper proposes a RL based solution to this complex problem in a novel way by 1–Implementing a Markov Decision Process (MDP) compliant RL environment with comprehensive probabilistic radar behavior models incorporated into it and 2–Integrating a core RL algorithm (namely DQN with Prioritized Experience Replay (DQN-PER) with a specific variant of transfer learning (namely learning from demonstrations (LfD)) in a single framework, demonstrating the utility of combining a core RL algorithm and a machine learning scheme toward boosting the performance of a learning agent, and more importantly to alleviate the sparse reward problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. These operational parameters and performance characteristics are normally made accessible to tactical mission planners via well-maintained data repositories called Electronic Warfare Databases (EWDBs), containing a wide range of technical parameters. So it is fairly common to assume the availability of such information for a range of mainstream threats.

References

  1. Abell DC, III, WDC (1998) A method for the determination of target aspect angle with respect to a radar

  2. Andrews L (1998) of Photo-optical Instrumentation Engineers, S.: Special Functions of Mathematics for Engineers. Online access with subscription: SPIE Digital Library. SPIE Optical Engineering Press. https://books.google.com.tr/books?id=2CAqsF-RebgC

  3. Bertsekas D.P. Reinforcement Learning and Optimal Control. Athena Scientific, Belmont, Massachusetts

  4. Bouhamed O, Ghazzai H, Besbes H, Massoud Y (2020) Autonomous uav navigation: A ddpg-based deep reinforcement learning approach

  5. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540

  6. Challita U, Saad W, Bettstetter C (2018) Deep reinforcement learning for interference-aware path planning of cellular-connected uavs. In: 2018 IEEE International Conference on Communications (ICC), pp. 1–7. https://doi.org/10.1109/ICC.2018.8422706

  7. Fujita Y, Nagarajan P, Kataoka T, Ishikawa T (2021) Chainerrl: A deep reinforcement learning library. Journal of Machine Learning Research 22(77), 1–14. http://jmlr.org/papers/v22/20-376.html

  8. Garcia F, Rachelson E (2013) Markov Decision Processes, chap. 1, pp. 1–38. John Wiley and Sons, Ltd

  9. Gosavi A (2015) Control Optimization with Reinforcement Learning, chap. 7, pp. 197–268. Springer US, Boston, MA. https://doi.org/10.1007/978-1-4899-7491-4_7

  10. Hare J (2019) Dealing with sparse rewards in reinforcement learning. CoRR abs/1910.09281. http://arxiv.org/abs/1910.09281

  11. Hester T, Vecerík M, Pietquin O, Lanctot M, Schaul T, Piot B, Sendonaris A, Dulac-Arnold G, Osband I, Agapiou J.P, Leibo J.Z, Gruslys A (2017) Learning from demonstrations for real world reinforcement learning. CoRR abs/1704.03732. http://arxiv.org/abs/1704.03732

  12. Inanc T, Muezzinoglu MK, Misovec K, Murray RM (2008) Framework for low-observable trajectory generation in presence of multiple radars. J Guid Control Dyn 31(6):1740–1749. https://doi.org/10.2514/1.35287

    Article  Google Scholar 

  13. Kabamba PT, Meerkov SM, Zeitz FH (2006) Optimal path planning for unmanned combat aerial vehicles to defeat radar tracking. J Guid Control Dyn 29(2):279–288. https://doi.org/10.2514/1.14303

    Article  Google Scholar 

  14. Kang EW (2008) Radar system analysis, design, and simulation. ARTECH HOUSE, INC

  15. Kingma D.P, Ba J (2017) Adam: a method for stochastic optimization

  16. Lazaridis A, Fachantidis A, Vlahavas I (2020) Deep reinforcement learning: a state-of-the-art walkthrough. J Art Intell Res 69:1421–1471. https://doi.org/10.1613/jair.1.12412

    Article  MathSciNet  MATH  Google Scholar 

  17. Le TP, Vien NA, Chung T (2018) A deep hierarchical reinforcement learning algorithm in partially observable markov decision processes. IEEE Access 6:49089–49102. https://doi.org/10.1109/ACCESS.2018.2854283

    Article  Google Scholar 

  18. Lee JW, Walker B, Cohen K (2021) Path planning of unmanned aerial vehicles in a dynamic environment. https://doi.org/10.2514/6.2011-1654

  19. Mahafza B.R (2013) Radar systems analysis and design using matlab, third. CRC Press, London

    MATH  Google Scholar 

  20. Mes MRK, Rivera AP (2021) Approximate dynamic programming by practical examples, chap. 3, pp. 63–101. Springer International Publishing, Cham. https://doi.org/10.1007/978-3-319-47766-4_3

  21. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  22. Pelosi M, Kopp C, Brown M (2012) Range-limited uav trajectory using terrain masking under radar detection risk. Appl Artif Intell 26(8):743–759. https://doi.org/10.1080/08839514.2012.713308

    Article  Google Scholar 

  23. Pham HX, La HM, Feil-Seifer D, Nguyen LV (2018) Autonomous UAV navigation using reinforcement learning. CoRR abs/1801.05086. http://arxiv.org/abs/1801.05086

  24. Qu C, Gai W, Zhong M, Zhang J (2020) A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (uavs) path planning. Applied Soft Computing 89:106099. https://doi.org/10.1016/j.asoc.2020.106099

  25. Recht B (2019) A tour of reinforcement learning: the view from continuous control. Ann Rev Contr Robot Auton Syst 2(1):253–279. https://doi.org/10.1146/annurev-control-053018-023825

    Article  Google Scholar 

  26. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. http://arxiv.org/abs/1511.05952. Cite arxiv:1511.05952Comment: Published at ICLR 2016

  27. Skolink M.I (1990) Radar handbook, second. McGraw-Hill, Oxford

    Google Scholar 

  28. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge

    MATH  Google Scholar 

  29. Sutton RS, Barto AG, Williams RJ (1992) Reinforcement learning is direct adaptive optimal control. IEEE Control Syst Mag 12(2):19–22. https://doi.org/10.1109/37.126844

    Article  Google Scholar 

  30. Swerling P (1954) Probability of detection for fluctuating targets. RAND Corporation, Santa Monica

    Google Scholar 

  31. Theile M, Bayerlein H, Nai R, Gesbert D, Caccamo M (2020) Uav path planning using global and local map information with deep reinforcement learning

  32. Yan C, Xiang X, Wang C (2020) Towards real-time path planning through deep reinforcement learning for a uav in dynamic environments. J Intell Robot Syst 98(2):297–309. https://doi.org/10.1007/s10846-019-01073-3

    Article  Google Scholar 

  33. Zabarankin M, Uryasev S, Murphey R (2006) Aircraft routing under the risk of detection. Naval Res Logist 53(8):728–747. https://doi.org/10.1002/nav.20165

    Article  MathSciNet  MATH  Google Scholar 

  34. Zeitz FH. Ucav path planning in the pesence of radar-guided surface-to-air missile threats, phd thesis

  35. Zhan W, Wang W, Chen N, Wang C (2014) Efficient uav path planning with multiconstraints in a 3d large battlefield environment. Math Probl Eng 2014:597092. https://doi.org/10.1155/2014/597092

    Article  Google Scholar 

  36. Zhang Z, Wu J, Dai J, He C (2020) Rapid penetration path planning method for stealth uav in complex environment with bb threats. Int J Aerospace Engi 2020:8896357. https://doi.org/10.1155/2020/8896357

    Article  Google Scholar 

  37. Zhu Z, Lin K, Zhou J (2020) Transfer learning in deep reinforcement learning: a survey. CoRR abs/2009.07888. https://arxiv.org/abs/2009.07888

Download references

Acknowledgements

The work reported in this paper has partially been supported by TÜBİTAK BİLGEM. I am grateful for that support. I also thank to Mr. İlhan D. Tüfekçi now working for ASELSAN A.Ş., Ankara, Turkey, for his valuable input regarding radar modeling.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to M. Nedim Alpdemir.

Ethics declarations

Conflict of interest

The author declares that he has no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alpdemir, M.N. Tactical UAV path optimization under radar threat using deep reinforcement learning. Neural Comput & Applic 34, 5649–5664 (2022). https://doi.org/10.1007/s00521-021-06702-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-021-06702-3

Keywords

Navigation