Skip to main content

Bayes-Adaptive Planning for Data-Efficient Verification of Uncertain Markov Decision Processes

  • Conference paper
  • First Online:
Quantitative Evaluation of Systems (QEST 2019)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11785))

Included in the following conference series:

Abstract

This work concerns discrete-time parametric Markov decision processes. These models encompass the uncertainty in the transitions of partially unknown probabilistic systems with input actions, by parameterising some of the entries in the stochastic matrix. Given a property expressed as a PCTL formula, we pursue a data-based verification approach that capitalises on the partial knowledge of the model and on experimental data obtained from the underlying system: after finding the set of parameters corresponding to model instances that satisfy the property, we quantify from data a measure (a confidence) on whether the system satisfies the property. The contribution of this work is a novel Bayes-Adaptive planning algorithm, which synthesises finite-memory strategies from the model allowing Bayes-Optimal selection of actions. Actions are selected for collecting data, with the goal of increasing its information content that is pertinent to the property of interest: this active learning goal aims at increasing the confidence on whether or not the system satisfies the given property.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Note that in Algorithm 1, we actually maintain an eligibility trace matrix \(\xi \).

  2. 2.

    This scheme is sometimes called one-hot encoding.

References

  1. Araya-López, M., Buffet, O., Thomas, V., Charpillet, F.: Active learning of MDP models. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 42–53. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_8

    Chapter  Google Scholar 

  2. Arming, S., Bartocci, E., Chatterjee, K., Katoen, J.-P., Sokolova, A.: Parameter-independent strategies for pMDPs via POMDPs. In: McIver, A., Horvath, A. (eds.) QEST 2018. LNCS, vol. 11024, pp. 53–70. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99154-2_4

    Chapter  Google Scholar 

  3. Baier, C., Katoen, J.P.: Principles of Model Checking. MIT Press, Cambridge (2008)

    MATH  Google Scholar 

  4. Cauchi, N., Abate, A.: StocHy: automated verification and synthesis of stochastic processes. In: 25th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS) (2019)

    Chapter  Google Scholar 

  5. Chen, Y., Nielsen, T.: Active learning of Markov decision processes for system verification. In: 2012 11th International Conference on Machine Learning and Applications (ICMLA), vol. 2, pp. 289–294, December 2012

    Google Scholar 

  6. Cubuktepe, M., et al.: Sequential convex programming for the efficient verification of parametric MDPs. In: Legay, A., Margaria, T. (eds.) TACAS 2017. LNCS, vol. 10206, pp. 133–150. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-662-54580-5_8

    Chapter  Google Scholar 

  7. Dehnert, C., Junges, S., Katoen, J.-P., Volk, M.: A Storm is Coming: a modern probabilistic model checker. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10427, pp. 592–600. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-63390-9_31

    Chapter  Google Scholar 

  8. Duff, M.O.: Optimal learning: computational procedures for Bayes-Adaptive Markov decision processes. Ph.D. thesis (2002)

    Google Scholar 

  9. Gordon, G.J.: Chattering in SARSA(\(\lambda \)) - a CMU learning lab internal report. Technical report (1996)

    Google Scholar 

  10. Guez, A., Heess, N., Silver, D., Dayan, P.: Bayes-adaptive simulation-based search with value function approximation. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, Curran Associates, Inc., vol. 27, pp. 451–459 (2014)

    Google Scholar 

  11. Guez, A., Silver, D., Dayan, P.: Efficient Bayes-adaptive reinforcement learning using sample-based search. In: Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012, Proceedings of a Meeting Held 3–6 December 2012, Lake Tahoe, Nevada, United States, pp. 1034–1042 (2012)

    Google Scholar 

  12. Haesaert, S., Van den Hof, P.M., Abate, A.: Data-driven property verification of grey-box systems by Bayesian experiment design. In: American Control Conference (ACC), 2015, IEEE, pp. 1800–1805 (2015)

    Google Scholar 

  13. Hahn, E.M., Han, T., Zhang, L.: Synthesis for PCTL in parametric markov decision processes. In: Bobaru, M., Havelund, K., Holzmann, G.J., Joshi, R. (eds.) NFM 2011. LNCS, vol. 6617, pp. 146–161. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20398-5_12

    Chapter  Google Scholar 

  14. Hahn, E.M., Hermanns, H., Wachter, B., Zhang, L.: PARAM: a model checker for parametric Markov models. In: Touili, T., Cook, B., Jackson, P. (eds.) CAV 2010. LNCS, vol. 6174, pp. 660–664. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-14295-6_56

    Chapter  Google Scholar 

  15. Hansson, H., Jonsson, B.: A logic for reasoning about time and reliability. Formal Aspects Comput. 6(5), 512–535 (1994)

    Article  Google Scholar 

  16. Henriques, D., Martins, J.G., Zuliani, P., Platzer, A., Clarke, E.M.: Statistical model checking for Markov decision processes. In: Proceedings of the 2012 Ninth International Conference on Quantitative Evaluation of Systems, QEST 2012, IEEE Computer Society, Washington, DC, USA. pp. 84–93 (2012)

    Google Scholar 

  17. Legay, A., Sedwards, S., Traonouez, L.-M.: Scalable verification of Markov decision processes. In: Canal, C., Idani, A. (eds.) SEFM 2014. LNCS, vol. 8938, pp. 350–362. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15201-1_23

    Chapter  Google Scholar 

  18. Mao, H., Chen, Y., Jaeger, M., Nielsen, T.D., Larsen, K.G., Nielsen, B.: Learning Markov decision processes for model checking. In: Proceedings Quantities in Formal Methods, QFM 2012, Paris, France, 28 August 2012, pp. 49–63 (2012)

    Article  Google Scholar 

  19. Marom, O., Rosman, B.: Belief reward shaping in reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  20. Polgreen, E., Wijesuriya, V.B., Haesaert, S., Abate, A.: Data-efficient Bayesian verification of parametric Markov chains. In: Agha, G., Van Houdt, B. (eds.) QEST 2016. LNCS, vol. 9826, pp. 35–51. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-43425-4_3

    Chapter  MATH  Google Scholar 

  21. Polgreen, E., Wijesuriya, V.B., Haesaert, S., Abate, A.: Automated experiment design for data-efficient verification of parametric Markov decision processes. In: Bertrand, N., Bortolussi, L. (eds.) QEST 2017. LNCS, vol. 10503, pp. 259–274. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-66335-7_16

    Chapter  MATH  Google Scholar 

  22. Poupart, P., Vlassis, N., Hoey, J., Regan, K.: An analytic solution to discrete Bayesian reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, ACM, New York, NY, USA, pp. 697–704 (2006)

    Google Scholar 

  23. Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, New York (1994)

    Book  Google Scholar 

  24. Quatmann, T., Dehnert, C., Jansen, N., Junges, S., Katoen, J.-P.: Parameter synthesis for Markov models: faster than ever. In: Artho, C., Legay, A., Peled, D. (eds.) ATVA 2016. LNCS, vol. 9938, pp. 50–67. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46520-3_4

    Chapter  MATH  Google Scholar 

  25. Ross, S., Pineau, J., Chaib-draa, B., Kreitmann, P.: A Bayesian approach for learning and planning in partially observable Markov decision processes. J. Mach. Learn. Res. 12(May), 1729–1770 (2011)

    MathSciNet  MATH  Google Scholar 

  26. Sen, K., Viswanathan, M., Agha, G.: Statistical model checking of black-box probabilistic systems. In: Alur, R., Peled, D.A. (eds.) CAV 2004. LNCS, vol. 3114, pp. 202–215. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-27813-9_16

    Chapter  Google Scholar 

  27. Silver, D., Sutton, R.S., Müller, M.: Temporal-difference search in Computer Go. Mach. Learn. 87(2), 183–219 (2012)

    Article  MathSciNet  Google Scholar 

  28. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  29. Sutton, R.S., Maei, H.R., Precup, D., Bhatnagar, S., Silver, D., Szepesvári, C., Wiewiora, E.: Fast gradient-descent methods for temporal-difference learning with linear function approximation. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, ACM, New York, NY, USA, pp. 993–1000 (2009)

    Google Scholar 

  30. Zuliani, P., Platzer, A., Clarke, E.M.: Bayesian statistical model checking with application to Stateflow/Simulink verification. Form. Methods Syst. Des. 43(2), 338–367 (2013)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Viraj Brian Wijesuriya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wijesuriya, V.B., Abate, A. (2019). Bayes-Adaptive Planning for Data-Efficient Verification of Uncertain Markov Decision Processes. In: Parker, D., Wolf, V. (eds) Quantitative Evaluation of Systems. QEST 2019. Lecture Notes in Computer Science(), vol 11785. Springer, Cham. https://doi.org/10.1007/978-3-030-30281-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30281-8_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30280-1

  • Online ISBN: 978-3-030-30281-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics