Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions

Yanes Luis, Samuel; Perales Esteve, Manuel; Gutiérrez Reina, Daniel; Toral Marín, Sergio

doi:10.1007/978-3-031-26564-8_2

Samuel Yanes Luis ORCID: orcid.org/0000-0002-7796-3599⁵,
Manuel Perales Esteve⁵,
Daniel Gutiérrez Reina⁵ &
…
Sergio Toral Marín⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1090))

581 Accesses
1 Citations

Abstract

Deep Reinforcement Learning algorithms have gained attention lately due to their ability to solve complex decision problems with a model-free and zero-derivative approach. In the case of multi-agent problems, these algorithms can help to easily find efficient cooperative policies in a feasible amount of time. In this chapter, we present the Informative Patrolling Problem, a commonplace task in the conservation of water resources. The approach is presented here as a convenient methodology for the synthesis of cooperative policies than can solve simultaneous objectives present in the unmanned monitoring of lakes and rivers: maximizing the collected information of water parameters and the collision-free routing with multiple surface vehicles. For this mixed objective, it is proposed a Deep Q-Learning scheme with a convolutional network as a shared fleet policy. In order to solve the credit assignment problem, it is proposed an effective multiagent decomposition of the informative reward with a discussion of other several state-of-the-art topics of Reinforcement Learning: noisy networks for enhanced exploration of the state-action domain, the use of a visual states, and the shaping of the reward function. This methodology, as it is quantitative demonstrated, allows a significant improvement in water resource monitoring compared to other heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://marmenor.upct.es/maps/.
2.
As it is demonstrated in [21] and because \(\varSigma \) is a positive semi-definite matrix, \(|\varSigma | = \prod _{i=0}^{dim(X)} \lambda _i\).
3.
In a total deterministic environment, this probability is assumed to be 1.
4.
\(\alpha = 0\) means full uniform sampling and vice versa.
5.
From this point, the decoupled reward is selected for better performance.
6.
See https://deap.readthedocs.io/en/master/api/benchmarks.html for the complete definition.

References

Arzamendia M, Gregor D, Gutierrez-Reina D, Toral S (2019) An evolutionary approach to constrained path planning of an autonomous surface vehicle for maximizing the covered area of ypacarai lake. Soft Comput 23(5):1723–1734
Article Google Scholar
Arzamendia M, Gutierrez D, Toral S, Gregor D, Asimakopoulou E, Bessis N (2019) Intelligent online learning strategy for an autonomous surface vehicle in lake environments using evolutionary computation. IEEE Intell Transp Syst Mag 11(4):110–125
Article Google Scholar
Bellman RE (2003) Dynamic Programming. Dover Publications Inc, USA
MATH Google Scholar
Coley K (2015) Unmanned surface vehicles: the future of data-collection. Ocean Chall 21:14–15
Google Scholar
Cover TM, Thomas JA (2006) Elements of information theory. Wiley Series in telecommunications and signal processing. Wiley-Interscience, USA
Google Scholar
Ferreira H, Almeida C, Martins A, Almeida J, Dias N, Dias A, Silva E (2009) Autonomous bathymetry for risk assessment with ROAZ robotic surface vehicle. In: OCEANS 2009-EUROPE, pp 1–6. https://doi.org/10.1109/OCEANSE.2009.5278235
Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C, Legg S (2017) Noisy networks for exploration. CoRR arXiv:1706.10295
van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning. CoRR arXiv:1509.06461
Hoen PJ, Tuyls K, Panait L, Luke S, La Poutré JA (2006) An overview of cooperative and competitive multiagent learning. In: Tuyls K, Hoen PJ, Verbeeck K, Sen S (eds) Learning and adaption in multi-agent systems. Springer, Berlin, Heidelberg, pp 1–46
Google Scholar
Julian KD, Kochenderfer MJ (2018) Distributed wildfire surveillance with autonomous aircraft using deep reinforcement learning. CoRR arXiv:1810.04244
Kathen MJT, Flores IJ, Reina DG (2021) An informative path planner for a swarm of ASVs based on an enhanced PSO with gaussian surrogate model components intended for water monitoring applications. Electronics 10(13):1605
Article Google Scholar
Krishna Lakshmanan A, Elara Mohan R, Ramalingam B, Vu Le A, Veerajagadeshwar P, Tiwari K, Ilyas M (2020) Complete coverage path planning using reinforcement learning for tetromino based cleaning and maintenance robot. Autom Constr 112(May 2019):103078. https://doi.org/10.1016/j.autcon.2020.103078
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) ICLR, http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#LillicrapHPHETS15
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. NIPS’17, Curran Associates Inc., Red Hook, NY, USA
Google Scholar
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236
Murphy RR, Steimle E, Griffin C, Cullins C, Hall M, Pratt K (2008) Cooperative use of unmanned sea surface and micro aerial vehicles at hurricane Wilma. J Field Robot 25(3):164–180. https://doi.org/10.1002/rob.20235
Article Google Scholar
Peralta F, Reina DG, Toral S, Arzamendia M, Gregor D (2021) A Bayesian optimization approach for multi-function estimation for environmental monitoring using an autonomous surface vehicle: Ypacarai lake case study. Electronics 10(8):963
Article Google Scholar
Peralta Samaniego F, Reina DG, Toral Marín SL, Gregor DO, Arzamendia M (2021) A Bayesian optimization approach for water resources monitoring through an autonomous surface vehicle: the ypacarai lake case study. IEEE Access 9(1):9163–9179. https://doi.org/10.1109/ACCESS.2021.3050934
Article Google Scholar
Piciarelli C, Foresti GL (2019) Drone patrolling with reinforcement learning. ACM Int Conf Proc Ser 1:1–6. https://doi.org/10.1145/3349801.3349805
Article Google Scholar
Popović M, Vidal-Calleja T, Hitz G (2020) An informative path planning framework for UAV-based terrain monitoring. Auton Robot 44:889–911. https://doi.org/10.1007/s10514-020-09903-2
Article Google Scholar
Rasmussen C, Williams C (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge, MA, USA. https://doi.org/10.7551/mitpress/3206.003.0001
Sánchez-García J, García-Campos J, Arzamendia M, Reina D, Toral S, Gregor D (2018) A survey on unmanned aerial and aquatic vehicle multi-hop networks: Wireless communications, evaluation tools and applications. Comput Commun 119:43–65. https://doi.org/10.1016/j.comcom.2018.02.002
Article Google Scholar
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Sim R, Roy N (2005) Global a-optimal robot exploration in slam. pp 661–666. https://doi.org/10.1109/ROBOT.2005.1570193
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA
Google Scholar
Ten Kathen MJ, Flores IJ, Reina DG (2021) A comparison of PSO-based informative path planners for autonomous surface vehicles for water resource monitoring. In: 7th international conference on machine learning technologies (ICMLT 2022). ACM
Google Scholar
Ten Kathen MJ, Reina DG, Flores IJ (2021) A comparison of PSO-based informative path planners for detecting pollution peaks of the Ypacarai lake with autonomous surface vehicles. In: International conference on optimization and learning (OLA’2022)
Google Scholar
Theile M, Bayerlein H, Nai R, Gesbert D, Caccamo M (2020) UAV coverage path planning under varying power constraints using deep reinforcement learning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 1444–1449
Google Scholar
Viseras A, Garcia R (2019) Deepig: multi-robot information gathering with deep reinforcement learning. IEEE Robot Autom Lett 4(3):3059–3066. https://doi.org/10.1109/LRA.2019.2924839
Article Google Scholar
Viseras A, Meißner M, Marchal J (2021) Wildfire front monitoring with multiple UAVs using deep Q-learning. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2021.3055651
Wang Z, de Freitas N, Lanctot M (2015) Dueling network architectures for deep reinforcement learning. CoRR arXiv:1511.06581
Woo J, Kim N (2020) Collision avoidance for an unmanned surface vehicle using deep reinforcement learning. Ocean Eng 199(107):001. https://doi.org/10.1016/j.oceaneng.2020.107001. www.sciencedirect.com/science/article/pii/S0029801820300792
Yanes Luis S, Reina DG, Toral Marín SL (2020) A deep reinforcement learning approach for the patrolling problem of water resources through autonomous surface vehicles: the Ypacarai lake case. IEEE Access 6(1):1–1. https://doi.org/10.1109/ACCESS.2020.3036938
Yanes Luis S, Reina DG, Marín SLT (2021) A multiagent deep reinforcement learning approach for path planning in autonomous surface vehicles: the Ypacaraí lake patrolling case. IEEE Access 9:17,084–17,099
Google Scholar
Yanes Luis S, Gutiérrez-Reina D, Toral Marin S (2021) A dimensional comparison between evolutionary algorithm and deep reinforcement learning methodologies for autonomous surface vehicles with water quality sensors. Sensors 21(8). https://doi.org/10.3390/s21082862. https://www.mdpi.com/1424-8220/21/8/2862
Yanes Luis S, Peralta F, Tapia Córdoba A, Rodríguez Álvaro, del Nozal Toral, Marín S, Gutiérrez Reina D (2022) An evolutionary multi-objective path planning of a fleet of ASVs for patrolling water resources. Eng Appl Artif Intell 112(104):852www.sciencedirect.com/science/article/pii/S0952197622001051
Zhang Q, Lin J, Sha Q, He B, Li G (2020) Deep interactive reinforcement learning for path following of autonomous underwater vehicle. CoRR arXiv:2001.03359

Download references

Acknowledgements

This work has been funded by the Spanish “Ministerio de Ciencia, Innovación y Universidades” under the PhD grant FPU-2020 (Formación del Profesorado Universitario) of Samuel Yanes Luis.

Author information

Authors and Affiliations

Department of Electronics, University of Sevilla, Av. de Los Descubrimientos s/n, 41003, Sevilla, Spain
Samuel Yanes Luis, Manuel Perales Esteve, Daniel Gutiérrez Reina & Sergio Toral Marín

Authors

Samuel Yanes Luis
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Perales Esteve
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Gutiérrez Reina
View author publications
You can also search for this author in PubMed Google Scholar
Sergio Toral Marín
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samuel Yanes Luis .

Editor information

Editors and Affiliations

Faculty of Computers and Artificial Intelligence, Benha University, Benha, Egypt
Ahmad Taher Azar
Electrical Engineering Department, University of Baghdad, College of Engineering, Baghdad, Iraq
Ibraheem Kasim Ibraheem
University of Technology, Baghdad, Iraq
Amjad Jaleel Humaidi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Yanes Luis, S., Perales Esteve, M., Gutiérrez Reina, D., Toral Marín, S. (2023). Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions. In: Azar, A.T., Kasim Ibraheem, I., Jaleel Humaidi, A. (eds) Mobile Robot: Motion Control and Path Planning. Studies in Computational Intelligence, vol 1090. Springer, Cham. https://doi.org/10.1007/978-3-031-26564-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-26564-8_2
Published: 01 July 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-26563-1
Online ISBN: 978-3-031-26564-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions