Skip to main content

Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions

  • Chapter
  • First Online:
Mobile Robot: Motion Control and Path Planning

Abstract

Deep Reinforcement Learning algorithms have gained attention lately due to their ability to solve complex decision problems with a model-free and zero-derivative approach. In the case of multi-agent problems, these algorithms can help to easily find efficient cooperative policies in a feasible amount of time. In this chapter, we present the Informative Patrolling Problem, a commonplace task in the conservation of water resources. The approach is presented here as a convenient methodology for the synthesis of cooperative policies than can solve simultaneous objectives present in the unmanned monitoring of lakes and rivers: maximizing the collected information of water parameters and the collision-free routing with multiple surface vehicles. For this mixed objective, it is proposed a Deep Q-Learning scheme with a convolutional network as a shared fleet policy. In order to solve the credit assignment problem, it is proposed an effective multiagent decomposition of the informative reward with a discussion of other several state-of-the-art topics of Reinforcement Learning: noisy networks for enhanced exploration of the state-action domain, the use of a visual states, and the shaping of the reward function. This methodology, as it is quantitative demonstrated, allows a significant improvement in water resource monitoring compared to other heuristics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://marmenor.upct.es/maps/.

  2. 2.

    As it is demonstrated in [21] and because \(\varSigma \) is a positive semi-definite matrix, \(|\varSigma | = \prod _{i=0}^{dim(X)} \lambda _i\).

  3. 3.

    In a total deterministic environment, this probability is assumed to be 1.

  4. 4.

    \(\alpha = 0\) means full uniform sampling and vice versa.

  5. 5.

    From this point, the decoupled reward is selected for better performance.

  6. 6.

    See https://deap.readthedocs.io/en/master/api/benchmarks.html for the complete definition.

References

  1. Arzamendia M, Gregor D, Gutierrez-Reina D, Toral S (2019) An evolutionary approach to constrained path planning of an autonomous surface vehicle for maximizing the covered area of ypacarai lake. Soft Comput 23(5):1723–1734

    Article  Google Scholar 

  2. Arzamendia M, Gutierrez D, Toral S, Gregor D, Asimakopoulou E, Bessis N (2019) Intelligent online learning strategy for an autonomous surface vehicle in lake environments using evolutionary computation. IEEE Intell Transp Syst Mag 11(4):110–125

    Article  Google Scholar 

  3. Bellman RE (2003) Dynamic Programming. Dover Publications Inc, USA

    MATH  Google Scholar 

  4. Coley K (2015) Unmanned surface vehicles: the future of data-collection. Ocean Chall 21:14–15

    Google Scholar 

  5. Cover TM, Thomas JA (2006) Elements of information theory. Wiley Series in telecommunications and signal processing. Wiley-Interscience, USA

    Google Scholar 

  6. Ferreira H, Almeida C, Martins A, Almeida J, Dias N, Dias A, Silva E (2009) Autonomous bathymetry for risk assessment with ROAZ robotic surface vehicle. In: OCEANS 2009-EUROPE, pp 1–6. https://doi.org/10.1109/OCEANSE.2009.5278235

  7. Fortunato M, Azar MG, Piot B, Menick J, Osband I, Graves A, Mnih V, Munos R, Hassabis D, Pietquin O, Blundell C, Legg S (2017) Noisy networks for exploration. CoRR arXiv:1706.10295

  8. van Hasselt H, Guez A, Silver D (2015) Deep reinforcement learning with double Q-learning. CoRR arXiv:1509.06461

  9. Hoen PJ, Tuyls K, Panait L, Luke S, La Poutré JA (2006) An overview of cooperative and competitive multiagent learning. In: Tuyls K, Hoen PJ, Verbeeck K, Sen S (eds) Learning and adaption in multi-agent systems. Springer, Berlin, Heidelberg, pp 1–46

    Google Scholar 

  10. Julian KD, Kochenderfer MJ (2018) Distributed wildfire surveillance with autonomous aircraft using deep reinforcement learning. CoRR arXiv:1810.04244

  11. Kathen MJT, Flores IJ, Reina DG (2021) An informative path planner for a swarm of ASVs based on an enhanced PSO with gaussian surrogate model components intended for water monitoring applications. Electronics 10(13):1605

    Article  Google Scholar 

  12. Krishna Lakshmanan A, Elara Mohan R, Ramalingam B, Vu Le A, Veerajagadeshwar P, Tiwari K, Ilyas M (2020) Complete coverage path planning using reinforcement learning for tetromino based cleaning and maintenance robot. Autom Constr 112(May 2019):103078. https://doi.org/10.1016/j.autcon.2020.103078

  13. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2016) Continuous control with deep reinforcement learning. In: Bengio Y, LeCun Y (eds) ICLR, http://dblp.uni-trier.de/db/conf/iclr/iclr2016.html#LillicrapHPHETS15

  14. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. NIPS’17, Curran Associates Inc., Red Hook, NY, USA

    Google Scholar 

  15. Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533. https://doi.org/10.1038/nature14236

  16. Murphy RR, Steimle E, Griffin C, Cullins C, Hall M, Pratt K (2008) Cooperative use of unmanned sea surface and micro aerial vehicles at hurricane Wilma. J Field Robot 25(3):164–180. https://doi.org/10.1002/rob.20235

    Article  Google Scholar 

  17. Peralta F, Reina DG, Toral S, Arzamendia M, Gregor D (2021) A Bayesian optimization approach for multi-function estimation for environmental monitoring using an autonomous surface vehicle: Ypacarai lake case study. Electronics 10(8):963

    Article  Google Scholar 

  18. Peralta Samaniego F, Reina DG, Toral Marín SL, Gregor DO, Arzamendia M (2021) A Bayesian optimization approach for water resources monitoring through an autonomous surface vehicle: the ypacarai lake case study. IEEE Access 9(1):9163–9179. https://doi.org/10.1109/ACCESS.2021.3050934

    Article  Google Scholar 

  19. Piciarelli C, Foresti GL (2019) Drone patrolling with reinforcement learning. ACM Int Conf Proc Ser 1:1–6. https://doi.org/10.1145/3349801.3349805

    Article  Google Scholar 

  20. Popović M, Vidal-Calleja T, Hitz G (2020) An informative path planning framework for UAV-based terrain monitoring. Auton Robot 44:889–911. https://doi.org/10.1007/s10514-020-09903-2

    Article  Google Scholar 

  21. Rasmussen C, Williams C (2006) Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press, Cambridge, MA, USA. https://doi.org/10.7551/mitpress/3206.003.0001

  22. Sánchez-García J, García-Campos J, Arzamendia M, Reina D, Toral S, Gregor D (2018) A survey on unmanned aerial and aquatic vehicle multi-hop networks: Wireless communications, evaluation tools and applications. Comput Commun 119:43–65. https://doi.org/10.1016/j.comcom.2018.02.002

    Article  Google Scholar 

  23. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952

  24. Sim R, Roy N (2005) Global a-optimal robot exploration in slam. pp 661–666. https://doi.org/10.1109/ROBOT.2005.1570193

  25. Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Cambridge, MA, USA

    Google Scholar 

  26. Ten Kathen MJ, Flores IJ, Reina DG (2021) A comparison of PSO-based informative path planners for autonomous surface vehicles for water resource monitoring. In: 7th international conference on machine learning technologies (ICMLT 2022). ACM

    Google Scholar 

  27. Ten Kathen MJ, Reina DG, Flores IJ (2021) A comparison of PSO-based informative path planners for detecting pollution peaks of the Ypacarai lake with autonomous surface vehicles. In: International conference on optimization and learning (OLA’2022)

    Google Scholar 

  28. Theile M, Bayerlein H, Nai R, Gesbert D, Caccamo M (2020) UAV coverage path planning under varying power constraints using deep reinforcement learning. In: 2020 IEEE/RSJ international conference on intelligent robots and systems (IROS). IEEE, pp 1444–1449

    Google Scholar 

  29. Viseras A, Garcia R (2019) Deepig: multi-robot information gathering with deep reinforcement learning. IEEE Robot Autom Lett 4(3):3059–3066. https://doi.org/10.1109/LRA.2019.2924839

    Article  Google Scholar 

  30. Viseras A, Meißner M, Marchal J (2021) Wildfire front monitoring with multiple UAVs using deep Q-learning. IEEE Access 1–1. https://doi.org/10.1109/ACCESS.2021.3055651

  31. Wang Z, de Freitas N, Lanctot M (2015) Dueling network architectures for deep reinforcement learning. CoRR arXiv:1511.06581

  32. Woo J, Kim N (2020) Collision avoidance for an unmanned surface vehicle using deep reinforcement learning. Ocean Eng 199(107):001. https://doi.org/10.1016/j.oceaneng.2020.107001. www.sciencedirect.com/science/article/pii/S0029801820300792

  33. Yanes Luis S, Reina DG, Toral Marín SL (2020) A deep reinforcement learning approach for the patrolling problem of water resources through autonomous surface vehicles: the Ypacarai lake case. IEEE Access 6(1):1–1. https://doi.org/10.1109/ACCESS.2020.3036938

  34. Yanes Luis S, Reina DG, Marín SLT (2021) A multiagent deep reinforcement learning approach for path planning in autonomous surface vehicles: the Ypacaraí lake patrolling case. IEEE Access 9:17,084–17,099

    Google Scholar 

  35. Yanes Luis S, Gutiérrez-Reina D, Toral Marin S (2021) A dimensional comparison between evolutionary algorithm and deep reinforcement learning methodologies for autonomous surface vehicles with water quality sensors. Sensors 21(8). https://doi.org/10.3390/s21082862. https://www.mdpi.com/1424-8220/21/8/2862

  36. Yanes Luis S, Peralta F, Tapia Córdoba A, Rodríguez Álvaro, del Nozal Toral, Marín S, Gutiérrez Reina D (2022) An evolutionary multi-objective path planning of a fleet of ASVs for patrolling water resources. Eng Appl Artif Intell 112(104):852www.sciencedirect.com/science/article/pii/S0952197622001051

  37. Zhang Q, Lin J, Sha Q, He B, Li G (2020) Deep interactive reinforcement learning for path following of autonomous underwater vehicle. CoRR arXiv:2001.03359

Download references

Acknowledgements

This work has been funded by the Spanish “Ministerio de Ciencia, Innovación y Universidades” under the PhD grant FPU-2020 (Formación del Profesorado Universitario) of Samuel Yanes Luis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Samuel Yanes Luis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Yanes Luis, S., Perales Esteve, M., Gutiérrez Reina, D., Toral Marín, S. (2023). Deep Reinforcement Learning Applied to Multi-agent Informative Path Planning in Environmental Missions. In: Azar, A.T., Kasim Ibraheem, I., Jaleel Humaidi, A. (eds) Mobile Robot: Motion Control and Path Planning. Studies in Computational Intelligence, vol 1090. Springer, Cham. https://doi.org/10.1007/978-3-031-26564-8_2

Download citation

Publish with us

Policies and ethics