Abstract
In an unmanned aerial vehicle ad-hoc network (UANET), sparse and rapidly mobile unmanned aerial vehicles (UAVs)/nodes can dynamically change the UANET topology. This may lead to UANET service performance issues. In this study, for planning rapidly changing UAV swarms, we propose a dynamic value iteration network (DVIN) model trained using the episodic Q-learning method with the connection information of UANETs to generate a state value spread function, which enables UAVs/nodes to adapt to novel physical locations. We then evaluate the performance of the DVIN model and compare it with the non-dominated sorting genetic algorithm II and the exhaustive method. Simulation results demonstrate that the proposed model significantly reduces the decision-making time for UAV/node path planning with a high average success rate.
摘要
在无人机自组网 (UANET) 中, 稀疏且高速移动的无人机节点会动态改变无人机自组网的拓扑结构, 这可能会导致无人机自组网服务性能问题. 为规划快速变化的无人机群, 本文提出一种动态值迭代网络 (DVIN) 模型, 该模型利用无人机自组网的连接信息, 采用场景式Q学习方法训练, 生成状态值传播函数, 使无人机节点能够自适应调节至新的物理位置. 然后, 评估了动态值迭代网络模型的性能, 并将其与非支配排序遗传算法NSGA-II和穷举法比较. 仿真结果表明, 动态值迭代网络模型显著缩短了无人机节点路径规划的决策时间, 且平均成功率更高.
Similar content being viewed by others
References
Abadi M, Barham P, Chen JM, et al., 2016. TensorFlow: a system for large-scale machine learning. Proc 12th USENIX Conf on Operating Systems Design and Implementation, p.265–283.
Bekmezci I, Sahingoz OK, Temel Ş, 2013. Flying ad-hoc networks (FANETs): a survey. Ad Hoc Netw, 11(3):1254–1270. https://doi.org/10.10167/j.adhoc.2012.12.004
Bellman R, 1966. Dynamic programming. Science, 153(3731):34–37. https://doi.org/10.1126/science.153.3731.34
Bertsekas DP, 1995. Dynamic Programming and Optimal Control. Athena Scientific, Belmont, USA.
Boureau YL, Bach F, LeCun Y, et al., 2010. Learning mid-level features for recognition. Proc IEEE Computer Society Conf on Computer Vision and Pattern Recognition, p.2559–2566. https://doi.org/10.1109/CVPR.2010.5539963
Buck I, Foley T, Horn D, et al., 2004. Brook for GPUs: stream computing on graphics hardware. ACM Trans Graph, 23(3):777–786. https://doi.org/10.1145/1015706.1015800
Challita U, Saad W, Bettstetter C, 2018. Deep reinforcement learning for interference-aware path planning of cellular-connected UAVs. Proc IEEE Int Conf on Communications, p.1–7. https://doi.org/10.1109/ICC.2018.8422706
Cruz F, Wüppen P, Fazrie A, et al., 2019. Action selection methods in a robotic reinforcement learning scenario. Proc IEEE Latin American Conf on Computational Intelligence, p.1–6. https://doi.org/10.1109/LA-CCI.2018.8625243
Deb K, Pratap A, Agarwal S, et al., 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans Evol Comput, 6(2):182–197. https://doi.org/10.1109/4235.996017
Fontes RR, 2019. Emulando Redes Sem Fio Com Mininet-WiFi. https://github.com/ramonfontes/mn-wifi-book-pt/blob/master/preview-book.pdf
Fontes RR, Afzal S, Brito SHB, et al., 2015. Mininet-WiFi: emulating software-defined wireless networks. Proc 11th Int Conf on Network and Service Management, p.384–389. https://doi.org/10.1109/CNSM.2015.7367387
François-Lavet V, Henderson P, Islam R, et al., 2018. An introduction to deep reinforcement learning. Found Trends® Mach Learn, 11(3–4):219–354. https://doi.org/10.1561/2200000071
Koohifar F, Kumbhar A, Guvenc I, 2017. Receding horizon multi-UAV cooperative tracking of moving RF source. IEEE Commun Lett, 21(6):1433–1436. https://doi.org/10.1109/LCOMM.2016.2603977
Krizhevsky A, Sutskever I, Hinton GE, 2017. ImageNet classification with deep convolutional neural networks. Commun ACM, 60(6):84–90. https://doi.org/10.1145/3065386
Lee J, Kang BY, Kim DW, 2013. Fast genetic algorithm for robot path planning. Electron Lett, 49(23):1449–1451. https://doi.org/10.1049/el.2013.3143
Mnih V, Kavukcuoglu K, Silver D, et al., 2015. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533. https://doi.org/10.1038/nature14236
Mnih V, Badia AP, Mirza L, et al., 2016. Asynchronous methods for deep reinforcement learning. Proc 33rd Int Conf on Machine Learning, p.1928–1937.
Niu SF, Chen SH, Guo HY, et al., 2018. Generalized value iteration networks: life beyond lattices. Proc 32nd AAAI Conf on Artificial Intelligence, p.6246–6253.
Roberge V, Tarbouchi M, Labonte G, 2013. Comparison of parallel genetic algorithm and particle swarm optimization for real-time UAV path planning. IEEE Trans Ind Inform, 9(1):132–141. https://doi.org/10.1109/TII.2012.2198665
Schaal S, 1999. Is imitation learning the route to humanoid robots? Trends Cogn Sci, 3(6):233–242. https://doi.org/10.1016/s1364-6613(99)01327-3
Tamar A, Wu Y, Thomas G, et al., 2017. Value iteration networks. Proc 26th Int Joint Conf on Artificial Intelligence, p.4949–4953. https://doi.org/10.24963/ijcai.2017/700
Tokic M, Palm G, 2011. Value-difference based exploration: adaptive control between epsilon-greedy and softmax. Proc 34th Annual German Conf on Advances in Artificial Intelligence, p.335–346. https://doi.org/10.1007/978-3-642-24455-1_33
Watkins CJCH, Dayan P, 1992. Q-learning. Mach Learn, 8(3–4):279–292. https://doi.org/10.1007/BF00992698
Zhang CY, Patras P, Haddadi H, 2019. Deep learning in mobile and wireless networking: a survey. IEEE Commun Surv Tutor, 21(3):2224–2287. https://doi.org/10.1109/COMST.2019.2904897
Zhang T, Li Q, Zhang CS, et al., 2017. Current trends in the development of intelligent unmanned autonomous systems. Front Inform Technol Electron Eng, 18(1):68–85. https://doi.org/10.1631/FITEE.1601650
Author information
Authors and Affiliations
Contributions
Wei LI and Bowei YANG designed the research. Wei LI processed the data and drafted the manuscript. Guanghua SONG and Xiaohong JIANG helped organize the manuscript. Wei LI and Bowei YANG revised and finalized the paper.
Corresponding author
Ethics declarations
Wei LI, Bowei YANG, Guanghua SONG, and Xiaohong JIANG declare that they have no conflict of interest.
Additional information
Project supported by the National Natural Science Foundation of China (No. 61501399), the SAIC MOTOR (No. 1925), and the National Key R&D Program of China (No. 2018AAA0102302)
Rights and permissions
About this article
Cite this article
Li, W., Yang, B., Song, G. et al. Dynamic value iteration networks for the planning of rapidly changing UAV swarms. Front Inform Technol Electron Eng 22, 687–696 (2021). https://doi.org/10.1631/FITEE.1900712
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1631/FITEE.1900712
Key words
- Dynamic value iteration networks
- Episodic Q-learning
- Unmanned aerial vehicle (UAV) ad-hoc network
- Non-dominated sorting genetic algorithm II (NSGA-II)
- Path planning