Abstract
The problem of synthesizing mission plans for multiple autonomous agents, including path planning and task scheduling, is often complex. Employing model checking alone to solve the problem might not be feasible, especially when the number of agents grows or requirements include real-time constraints. In this paper, we propose a novel approach called MCRL that integrates model checking and reinforcement learning to overcome this limitation. Our approach employs timed automata and timed computation tree logic to describe the autonomous agents’ behavior and requirements, and trains the model by a reinforcement learning algorithm, namely Q-learning, to populate a table used to restrict the state space of the model. Our method provides means to synthesize mission plans for multi-agent systems whose complexity exceeds the scalability boundaries of exhaustive model checking, but also to analyze and verify synthesized mission plans to ensure given requirements. We evaluate the proposed method on various scenarios involving autonomous agents, as well as present comparisons with two similar approaches, TAMAA and UPPAAL STRATEGO. The evaluation shows that MCRL performs better for a number of agents greater than three.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
UPPAAL 4.1.22 was published in March 2019 on http://www.uppaal.org/.
- 2.
Further visual information on the missions plans in TAMAA can be found at http://doi.org/10.5281/zenodo.3731960.
References
Abdeddaı, Y., Asarin, E., Maler, O., et al.: Scheduling with Timed Automata, vol. 354. Elsevier, Amsterdam (2006)
Alur, R., Dill, D.: Automata for modeling real-time systems. In: Paterson, M.S. (ed.) ICALP 1990. LNCS, vol. 443, pp. 322–335. Springer, Heidelberg (1990). https://doi.org/10.1007/BFb0032042
Behjati, R., Sirjani, M., Nili Ahmadabadi, M.: Bounded rational search for on-the-fly model checking of LTL properties. In: Arbab, F., Sirjani, M. (eds.) FSEN 2009. LNCS, vol. 5961, pp. 292–307. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11623-0_17
Behrmann, G., Cougnard, A., David, A., Fleury, E., Larsen, K.G., Lime, D.: UPPAAL-Tiga: time for playing games!. In: Damm, W., Hermanns, H. (eds.) CAV 2007. LNCS, vol. 4590, pp. 121–125. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-73368-3_14
Bengtsson, J., Yi, W.: Timed automata: semantics, algorithms and tools. In: Desel, J., Reisig, W., Rozenberg, G. (eds.) ACPN 2003. LNCS, vol. 3098, pp. 87–124. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27755-2_3
Bouton, M., Cosgun, A., Kochenderfer, M.J.: Belief state planning for autonomously navigating urban intersections. In: Intelligent Vehicles Symposium, pp. 825–830. IEEE (2017)
Bouton, M., Karlsson, J., Nakhaei, A., Fujimura, K., Kochenderfer, M.J., Tumova, J.: Reinforcement learning with probabilistic guarantees for autonomous driving. arXiv preprint arXiv:1904.07189 (2019)
Bucklew, J.: Introduction to Rare Event Simulation. Springer, New York (2013). https://doi.org/10.1007/978-1-4757-4078-3
Chandler, P., Pachter, M.: Research issues in autonomous control of tactical UAVs. In: Proceedings of the 1998 American Control Conference. ACC (IEEE Cat. No. 98CH36207). IEEE (1998)
Clarke, E.M., Klieber, W., Nováček, M., Zuliani, P.: Model checking and the state explosion problem. In: Meyer, B., Nordio, M. (eds.) LASER 2011. LNCS, vol. 7682, pp. 1–30. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35746-6_1
Daniel, K., Nash, A., Koenig, S., Felner, A.: Theta*: any-angle path planning on grids. J. Artif. Intell. Res. 39, 533–579 (2010)
David, A., et al.: Statistical model checking for stochastic hybrid systems (2012)
David, A., Jensen, P.G., Larsen, K.G., Mikučionis, M., Taankvist, J.H.: Uppaal Stratego. In: Baier, C., Tinelli, C. (eds.) TACAS 2015. LNCS, vol. 9035, pp. 206–211. Springer, Heidelberg (2015). https://doi.org/10.1007/978-3-662-46681-0_16
Dewey, D.: Reinforcement learning and the reward engineering principle. In: 2014 AAAI Spring Symposium Series (2014)
Fisher, H.: Probabilistic learning combinations of local job-shop scheduling rules. In: Industrial Scheduling, pp. 225–251. Prentice Hall, Englewood Cliffs (1963)
Franklin, S., Graesser, A.: Is it an agent, or just a program?: a taxonomy for autonomous agents. In: Müller, J.P., Wooldridge, M.J., Jennings, N.R. (eds.) ATAL 1996. LNCS, vol. 1193, pp. 21–35. Springer, Heidelberg (1997). https://doi.org/10.1007/BFb0013570
Gu, R., Enoiu, E.P., Seceleanu, C.: TAMAA: UPPAAL-based mission planning for autonomous agents. In: The 35th ACM/SIGAPP Symposium On Applied Computing SAC2020, Brno, Czech Republic, 30 March 2020 (2019)
Gu, R., Marinescu, R., Seceleanu, C., Lundqvist, K.: Towards a two-layer framework for verifying autonomous vehicles. In: Badger, J.M., Rozier, K.Y. (eds.) NFM 2019. LNCS, vol. 11460, pp. 186–203. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20652-9_12
Larsen, K.G., Legay, A.: On the power of statistical model checking. In: Margaria, T., Steffen, B. (eds.) ISoLA 2016. LNCS, vol. 9953, pp. 843–862. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47169-3_62
LaValle, S.M.: Rapidly-exploring random trees: a new tool for path planning. Technical report (1998)
Li, X., Serlin, Z., Yang, G., Belta, C.: A formal methods approach to interpretable reinforcement learning for robotic planning. Sci. Robot. 4 (2019)
Mallozzi, P., Pardo, R., Duplessis, V., Pelliccione, P., Schneider, G.: MoVEMo: a structured approach for engineering reward functions. In: 2018 Second IEEE International Conference on Robotic Computing (IRC), pp. 250–257. IEEE (2018)
Nikou, A., Boskos, D., Tumova, J., Dimarogonas, D.V.: On the timed temporal logic planning of coupled multi-agent systems. Automatica 97, 339–345 (2018)
Pelánek, R.: Fighting state space explosion: review and evaluation. In: Cofer, D., Fantechi, A. (eds.) FMICS 2008. LNCS, vol. 5596, pp. 37–52. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-03240-0_7
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 2. MIT press Cambridge, Cambridge (1998)
Wang, Y., Chaudhuri, S., Kavraki, L.E.: Bounded policy synthesis for POMDPs with safe-reachability objectives. In: International Conference on Autonomous Agents and Multi Agent Systems. IFAAMS (2018)
Watkins, C.J.H.: Learning from Delayed Rewards. King’s College, Cambridge (1989)
Acknowledgement
The research leading to the presented results has been undertaken within the research profile DPAC - Dependable Platform for Autonomous Systems and Control project, funded by the Swedish Knowledge Foundation, grant number: 20150022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gu, R., Enoiu, E., Seceleanu, C., Lundqvist, K. (2020). Verifiable and Scalable Mission-Plan Synthesis for Autonomous Agents. In: ter Beek, M.H., Ničković, D. (eds) Formal Methods for Industrial Critical Systems. FMICS 2020. Lecture Notes in Computer Science(), vol 12327. Springer, Cham. https://doi.org/10.1007/978-3-030-58298-2_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-58298-2_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58297-5
Online ISBN: 978-3-030-58298-2
eBook Packages: Computer ScienceComputer Science (R0)